From rainer.rutka at uni-konstanz.de Wed Mar 1 06:30:39 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Wed, 1 Mar 2017 13:30:39 +0100 Subject: [maker-devel] Maker-Error when started with IMPI In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> Message-ID: <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> Hi Carson. Again THANK YOU for your efforts :-) Am 24.02.2017 um 18:30 schrieb Carson Holt: > Specific things. > > 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause problems with other MPI's. OK, I deleted this envirnoment. Not set any more. > 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has to be compiled for the flavor you are using, so make sure you have a separate installation of MAKER for Intel MPI). Also validate that the mpicc and libmpi.h listed during the MAKER install belong to Intel MPI. Don?t just assume they do because you loaded the module. Manually verify the paths during MAKER?s setup. I validated: UC:[kn at uc1n996 bwhpc-examples]$ module list Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) FOR MPICC: UC:[kn at uc1n996 bwhpc-examples]$ type mpicc mpicc is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc FOR LIBMPI: UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., because the SW was compiled and linkes without any errors or missing libs. > 3. The error you got previously should not even be possible with the current version of Intel MPI, > which is why I say that when you called mpiexec, something else (that was not Intel MPI) was launched. > Easy solution is to give the full path of mpiexec in your job, so are not relying on PATH to be unaltered in your job. mpiexec is in the PATH and the right one is/was used, too. MPIXEC: UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec mpiexec is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec UC:[kn at bwhpc-examples]$ > Do not do ?> mpiexec -nc 1 maker > Do this for example ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker OK, so i did: [...] #MSUB -l nodes=1:ppn=1 #MSUB -l mem=20gb [...] echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker [...] > 4. Build and run on the same node for your test. If you build on one node and run on another, you may > be changing your environment in ways you don?t realize that break things. So if you can build and test on > the same node and it works, then it fails when you test it elsewhere, then you have to track down how your > environment is changing. OK I did. Same node: uc1n996 UNFORTUNATELY I GOT THE SAME ERROR: [...] ### Runing Maker example LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libmpi.so OMPI_MCA_mpi_warn_on_fork=0 I_MPI_CPUINFO=proc I_MPI_PMI_LIBRARY=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libpmi.so I_MPI_PIN_DOMAIN=node I_MPI_FABRICS=shm:tcp I_MPI_HYDRA_IFACE=ib0 mpiexec_uc1n342.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n342.localdomain_kn_pop235844); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) [...] > ?Carson tbc. ? :-) THANX -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From rainer.rutka at uni-konstanz.de Wed Mar 1 06:51:05 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Wed, 1 Mar 2017 13:51:05 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> Message-ID: <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> Sorry, sent wrong e-mail :-( IGNORE THE FIRST MAIL I SENT! Am 01.03.2017 um 13:30 schrieb Rainer Rutka: Hi Carson. Again THANK YOU for your efforts :-) Am 24.02.2017 um 18:30 schrieb Carson Holt: > Specific things. > > 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause > problems with other MPI's. OK, I deleted this envirnoment. Not set any more. > 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has > to be compiled for the flavor you are using, so make sure you have a > separate installation of MAKER for Intel MPI). Also validate that the > mpicc and libmpi.h listed during the MAKER install belong to Intel > MPI. Don?t just assume they do because you loaded the module. Manually > verify the paths during MAKER?s setup. I validated: UC:[kn at uc1n996 bwhpc-examples]$ module list Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) FOR MPICC: UC:[kn at uc1n996 bwhpc-examples]$ type mpicc mpicc is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc FOR LIBMPI: UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., because the SW was compiled and linkes without any errors or missing libs. > 3. The error you got previously should not even be possible with the > current version of Intel MPI, > which is why I say that when you called mpiexec, something else (that > was not Intel MPI) was launched. > Easy solution is to give the full path of mpiexec in your job, so are > not relying on PATH to be unaltered in your job. mpiexec is in the PATH and the right one is/was used, too: MPIXEC: UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec mpiexec is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > Do not do ?> mpiexec -nc 1 maker > Do this for example ?> > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > -nc maker OK, so i did: [...] #MSUB -l nodes=1:ppn=1 #MSUB -l mem=20gb [...] echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker [...] > 4. Build and run on the same node for your test. If you build on one > node and run on another, you may > be changing your environment in ways you don?t realize that break > things. So if you can build and test on > the same node and it works, then it fails when you test it elsewhere, > then you have to track down how your > environment is changing. OK I did. Same node: uc1n996 UNFORTUNATELY I GOT THE SAME ERROR: [...] Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) 3) bio/maker/2.31.8_impi ### Display internal Maker/bwHPC environments... MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bin MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bwhpc-examples ### Runing Maker example OMPI_MCA_mpi_warn_on_fork=0 I_MPI_CPUINFO=proc I_MPI_PMI_LIBRARY=/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/lib/libmpi.so I_MPI_PIN_DOMAIN=node I_MPI_FABRICS=shm:tcp I_MPI_HYDRA_IFACE=ib0 mpiexec_uc1n326.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n326.localdomain_kn_pop235844); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) ### Cleaning up files ... removing unnecessary scratch files ... [...] > ?Carson tbc. ? :-) THANX -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Wed Mar 1 14:32:54 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 13:32:54 -0700 Subject: [maker-devel] SOBA statistics of Maker annotation In-Reply-To: <2377C5DD-569C-4248-B458-349D7AEA32F5@ucr.edu> References: <688EB172-FEC8-4995-8AA2-0925AF62201A@ucr.edu> <6551374B-54FF-4047-B7A8-A49327FC0036@gmail.com> <73526BAB-57F8-4A47-AADD-DB6883573EAB@ucr.edu> <2377C5DD-569C-4248-B458-349D7AEA32F5@ucr.edu> Message-ID: <6E776F59-F71F-49F7-872A-A0E404970C7E@gmail.com> Perhaps with the way you are counting sequence from the RepeatMasker report you are double counting for repeats that overlap? MAKER reports the command line it uses as part of its STDERR, so you can manually run any step you want outside of MAKER to evaluate. ?Carson > On Feb 25, 2017, at 10:14 AM, Qihua Liang wrote: > > Thank you Barry and Carson! > > I compared the SOBA statistics of RepeatMasker footprint and the report generated by running RepeatMasker alone, I got 2 different parentage of repeats masked. Running RepeatMasker with myTrained.lib, the repeats masked are 42%. But within Maker GFF3, the percentage of repeats masker is only ~18%. What may cause such difference here? > > Thanks > Qihua > >> On Feb 21, 2017, at 1:34 PM, Carson Holt wrote: >> >> MAKER merges overlapping RepeatMasker results into a single longer feature. >> >> ?Carson >> >> >>> On Feb 20, 2017, at 1:34 PM, Qihua Liang wrote: >>> >>> Hi Carson, >>> >>> Thanks for your reply! Now I understand the minimal length of SOBA analysis of Maker gene models in GFF3. >>> >>> I am also using SOBA to calculate the statistics of another sources in the GFF3 file, and I have found another strange thing about RepeatMasker annotation and footprint percentage. >>> >>> Previously, I ran RepeatMasker outside of Maker once, with my_trained.lib (same as used in Maker), and I had bases masked of ~42% from the output report. >>> In running Maker, I provided both ?model_org=all? and ?rmlib=my_trained.lib?. Under these setting, RepeatMasker should be run twice and the merged results of the twice running will be the output of RepeatMasker in GFF3. I am expecting the bases masked by RepeatMasker in the GFF3 will be more than 42%. >>> >>> But in SOBA calculation, the footprint percentage is only ~18%. Referring to the SOBA paper, footprint is calculated as "non-redundant nucleotide count of all features of a given type?. I assume that when SOBA calculates footprint of RepeatMasker features in GFF3, it should be counting the same as "masked bps" as RepeatMasker itself. >>> >>> When Maker ?combines? the 2 runs of RepeatMasker, is it a merge or an overlapping of 2 RepeatMasker results? >>> Besides, instead of using SOBA, are there any accessory scripts updated in Maker to calculate the statistics of the annotations? >>> >>> Thanks >>> Qihua >>> >>> >>>> On Feb 19, 2017, at 10:05 PM, Carson Holt wrote: >>>> >>>> IN GFF3 the CDS and UTR lengths are actually the merge of all CDSs or UTR features, but SOBA is reporting each part individually which may be causing your confusion. This is because SOBA reports per feature statistics and not merged feature statistics. >>>> >>>> CDS?s do not have to take up entire exons. For example start/stop codons may cross splice sites and be split across exons (very common). The result is that each part of the split CDS becomes a separate feature. As a result SOBA will treat each one separately. So a single bp CDS here is not abnormal, since the remaining part of the CDS continues on the next exon as a separate line. The exact same is true for UTR. >>>> >>>> If you want the merged length of the UTR and CDS, it is bets to pull that info out of the _QI= part of the GFF3 attributes for each mRNA. >>>> >>>> What about single bp exons? Those cannot occur unless you gave an input GFF3 with predictions that have single bp exons. The predictors like SNAP and Augustus just won?t produce them, with one exception. They can potentially produce them for the first/last exon. This is not because the exon is 1 bp, but rather because the predictor only reports the CDS part of the exon. As a result if the stop/start codon may have only 1 bp overlapping that exon, but one you add UTR the exon will extend from that point and will no longer be 1bp in length. But if the UTR never gets added, then you can be left with a partial initial/terminal exon. >>>> >>>> However more than likely what you are seeing is just related to how SOBA reports individual feature line stats as opposed to merged stats for CDS and UTR. >>>> >>>> Thanks, >>>> Carson >>>> >>>>> On Feb 18, 2017, at 9:43 AM, Qihua Liang wrote: >>>>> >>>>> Dear Maker develop team, >>>>> >>>>> I used SOBA website to calculate the statistics of Maker annotation, and I found out the length of some features of Maker, like CDS, exon, 5? and 3?UTR, the minimal length of such features can be as short as 1bp. These are confusing, with such features length of 1bp. When Maker combines different gene models and makes such predictions, how will it accept such abnormal exon/CDS length? And is there any parameters in the bopt.ctl or evm.ctl to avoid such abnormal gene models? >>>>> >>>>> Thanks >>>>> Qihua >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>> >> > From carsonhh at gmail.com Wed Mar 1 14:36:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 13:36:17 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: Message-ID: If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). An example MPI launch command for running on 200 CPUs on a cluster ?> mpiexec -n 200 maker 2> maker_mpi1.error ?Carson > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: > > Hello: > > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? > > The example in the protocol > % maker 2> maker1.error > % maker 2> maker2.error & > % maker 2> maker3.error & > ...... > > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Wed Mar 1 15:09:30 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 1 Mar 2017 16:09:30 -0500 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> Message-ID: Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? Best Quanwei 2017-03-01 15:54 GMT-05:00 Carson Holt : > If you split into separate files, you can use the -g option to select the > input file together with the -base option so all output goes to the same > directory. Because they technically have different input files, this will > avoid file locking issues. You have to use the -dsindex option at the end > to rebuild the datastore index, so it looks like a single job. But that is > one way to get around the issue. > > ?Carson > > > > On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: > > Thank you. But I met some problems with MPI on our server. So now I split > my contigs into several files and annotate those files separately. After I > finish the annotation on each file, I will merge the results. > > Thank you for your explanation! > > Best > Quanwei > > 2017-03-01 15:36 GMT-05:00 Carson Holt : > >> If you submit too many simultaneous, MAKER run then file locks will start >> to collide and one run will slow down the others. You should submit fewer >> simultaneous jobs and instead use MPI (maker must be configured and >> compiled to use MPI). >> >> An example MPI launch command for running on 200 CPUs on a cluster ?> >> mpiexec -n 200 maker 2> maker_mpi1.error >> >> ?Carson >> >> >> >> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang >> wrote: >> > >> > Hello: >> > >> > I am doing genome annotation using Maker on our high performance >> computational cluster (HPC). Due to some issues of MPI, I submitted the >> Maker jobs several times under the same directory to HPC. Followed by the >> example in the protocol (as shown below), when I submit the jobs I make >> them as background processes by "&" except the first one. Is this necessary >> when I submit a job to a HPC? I found it costed much much longer time than >> I expected (according to a testing on a smaller data set). I am not sure >> whether setting the process as background process lead to this issue? >> > >> > The example in the protocol >> > % maker 2> maker1.error >> > % maker 2> maker2.error & >> > % maker 2> maker3.error & >> > ...... >> > >> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 >> of the time that cost for annotation a 50000bp contig? I am using SNAP for >> an inito and RNA-seq assembly and protein sequences as evidence. I have >> more than half contigs shorter than 300bp (whose total length is only about >> 5% of the total length of all contigs), I want to know whether I can save >> about half (or only about 5%) of the time if I ignore those short contigs. >> > >> > Thanks >> > >> > Best >> > Quanwei >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 1 15:10:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 14:10:20 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> Message-ID: <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> That will work. ?Carson > On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: > > Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? > > Best > Quanwei > > 2017-03-01 15:54 GMT-05:00 Carson Holt >: > If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. > > ?Carson > > > >> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang > wrote: >> >> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >> >> Thank you for your explanation! >> >> Best >> Quanwei >> >> 2017-03-01 15:36 GMT-05:00 Carson Holt >: >> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >> >> An example MPI launch command for running on 200 CPUs on a cluster ?> >> mpiexec -n 200 maker 2> maker_mpi1.error >> >> ?Carson >> >> >> >> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang > wrote: >> > >> > Hello: >> > >> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >> > >> > The example in the protocol >> > % maker 2> maker1.error >> > % maker 2> maker2.error & >> > % maker 2> maker3.error & >> > ...... >> > >> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >> > >> > Thanks >> > >> > Best >> > Quanwei >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 1 18:43:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 17:43:30 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> Message-ID: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. The following may or may not work if the first two fail: Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h Also send me this file ?> perl/lib/MAKER/ConfigData.pm Thanks, Carson > On Mar 1, 2017, at 5:51 AM, Rainer Rutka wrote: > > > Sorry, sent wrong e-mail :-( > > IGNORE THE FIRST MAIL I SENT! > > Am 01.03.2017 um 13:30 schrieb Rainer Rutka: > Hi Carson. > Again THANK YOU for your efforts :-) > Am 24.02.2017 um 18:30 schrieb Carson Holt: >> Specific things. >> >> 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause >> problems with other MPI's. > > OK, I deleted this envirnoment. Not set any more. > >> 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has >> to be compiled for the flavor you are using, so make sure you have a >> separate installation of MAKER for Intel MPI). Also validate that the >> mpicc and libmpi.h listed during the MAKER install belong to Intel >> MPI. Don?t just assume they do because you loaded the module. Manually >> verify the paths during MAKER?s setup. > > I validated: > UC:[kn at uc1n996 bwhpc-examples]$ module list > Currently Loaded Modulefiles: > 1) compiler/intel/16.0(default) > 2) mpi/impi/5.1.3-intel-16.0(default) > FOR MPICC: > UC:[kn at uc1n996 bwhpc-examples]$ type mpicc > mpicc is > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc > FOR LIBMPI: > UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 > UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h > Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., > because the SW was compiled and linkes without any errors or missing libs. > >> 3. The error you got previously should not even be possible with the >> current version of Intel MPI, >> which is why I say that when you called mpiexec, something else (that >> was not Intel MPI) was launched. >> Easy solution is to give the full path of mpiexec in your job, so are >> not relying on PATH to be unaltered in your job. > > mpiexec is in the PATH and the right one is/was used, too: > MPIXEC: > UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec > mpiexec is > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > >> Do not do ?> mpiexec -nc 1 maker >> Do this for example ?> >> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec >> -nc maker > OK, so i did: > [...] > #MSUB -l nodes=1:ppn=1 > #MSUB -l mem=20gb > [...] > echo " " > echo "### Runing Maker example" > echo " " > export OMPI_MCA_mpi_warn_on_fork=0 > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > -nc maker > [...] > >> 4. Build and run on the same node for your test. If you build on one >> node and run on another, you may >> be changing your environment in ways you don?t realize that break >> things. So if you can build and test on >> the same node and it works, then it fails when you test it elsewhere, >> then you have to track down how your >> environment is changing. > > OK I did. Same node: uc1n996 > UNFORTUNATELY I GOT THE SAME ERROR: > [...] > Currently Loaded Modulefiles: > 1) compiler/intel/16.0(default) > 2) mpi/impi/5.1.3-intel-16.0(default) > 3) bio/maker/2.31.8_impi > > > ### Display internal Maker/bwHPC environments... > > MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bin > MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bwhpc-examples > > > ### Runing Maker example > OMPI_MCA_mpi_warn_on_fork=0 > I_MPI_CPUINFO=proc > I_MPI_PMI_LIBRARY=/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/lib/libmpi.so > I_MPI_PIN_DOMAIN=node > I_MPI_FABRICS=shm:tcp > I_MPI_HYDRA_IFACE=ib0 > mpiexec_uc1n326.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n326.localdomain_kn_pop235844); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > ### Cleaning up files ... removing unnecessary scratch files ... > [...] > >> ?Carson > tbc. ? :-) > THANX > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From rainer.rutka at uni-konstanz.de Thu Mar 2 02:41:37 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Thu, 2 Mar 2017 09:41:37 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: Hi Carson! Am 02.03.2017 um 01:43 schrieb Carson Holt: > Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h Same error(s). > If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. Yes, they fail. > The following may or may not work if the first two fail: > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello WORKS FINE! > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h WORKS! > Also send me this file ?> perl/lib/MAKER/ConfigData.pm Attached to this mail. > Thanks, > Carson -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: ConfigData.pm Type: application/x-perl Size: 5424 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From rainer.rutka at uni-konstanz.de Thu Mar 2 03:07:07 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Thu, 2 Mar 2017 10:07:07 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: <6cd0a8c5-e6a5-a171-5f80-11d193627aeb@uni-konstanz.de> > The following may or may not work if the first two fail: > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h mpirun, !mpiexec is running, too! -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Thu Mar 2 11:41:35 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Mar 2017 10:41:35 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. Thanks, Carson > On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: > > Hi Carson! > > Am 02.03.2017 um 01:43 schrieb Carson Holt: >> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h > Same error(s). > >> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. > Yes, they fail. > >> The following may or may not work if the first two fail: >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello > WORKS FINE! > >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h > WORKS! > >> Also send me this file ?> perl/lib/MAKER/ConfigData.pm > Attached to this mail. > >> Thanks, >> Carson > > -- > Rainer Rutka > University of Konstanz > Communication, Information, Media Centre (KIM) > * High-Performance-Computing (HPC) > * KIM-Support and -Base-Services > Room: V511 > 78457 Konstanz, Germany > +49 7531 88-5413 > From mnaymik at tgen.org Thu Mar 2 14:05:22 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Thu, 2 Mar 2017 13:05:22 -0700 Subject: [maker-devel] ThrowNullPointerException() Message-ID: I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Now starting the contig!! SeqID: 5239 Length: 1395 #--------------------------------------------------------------------- Error: NCBI C++ Exception: "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 2 14:25:59 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Mar 2017 13:25:59 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: Message-ID: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Try reinstalling blast, or upgrade to a newer version of blast. ?Carson > On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: > > > I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? > > > > MAKER WARNING: All old files will be erased before continuing > > #--------------------------------------------------------------------- > > Now starting the contig!! > > SeqID: 5239 > > Length: 1395 > > #--------------------------------------------------------------------- > > > > > > Error: NCBI C++ Exception: > > "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti > > > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Fri Mar 3 10:48:34 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Fri, 3 Mar 2017 16:48:34 +0000 Subject: [maker-devel] how to deal with Contigs to run maker? In-Reply-To: <2017022815435664227911@cau.edu.cn> References: <2017022815435664227911@cau.edu.cn> Message-ID: <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> Hi Chao, I don?t think merging the contigs is a good idea. Unless you actually know the distances (in basepairs) between the contigs, this could lead to many spurious alignments. I think you should leave them separate in your fasta file for both repeatmodeler, ab-initio training and running maker. If you?re worried about short contigs in your assembly, you can exclude shorter contigs with the min_contig option in the maker_opts control file. ~Daniel On Feb 28, 2017, at 2:43 AM, dcg at cau.edu.cn wrote: Dear sir: After assemblying, I got many contigs and their order in each chromosome. What I have done is merging these contigs into each chromosomes followed by the order, with 100 Ns inserted betwwen each contigs. So that I got chr1 chr2......Then I ran the repeatmodeler, predictor to annotate it. Could my way reach a high-quality result? Should I use all the contigs to mask repeats and practice predictor? Is there any better way to do genome-wide annotation? I'm looking forward to your reply! Best wishes! Chao Chao ________________________________ 2017.02.28 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 3 11:32:15 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 3 Mar 2017 10:32:15 -0700 Subject: [maker-devel] how to deal with Contigs to run maker? In-Reply-To: <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> References: <2017022815435664227911@cau.edu.cn> <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> Message-ID: <7CF3A765-5A93-42B2-AA28-4596CD25A459@gmail.com> I agree. Also a 100bp insert of N?s will essentially be ignored by aligners and predictors. They?ll jump across it as if it was just an intron, resulting in false merges and bad predictions. ?Carson > On Mar 3, 2017, at 9:48 AM, Ence,daniel wrote: > > Hi Chao, I don?t think merging the contigs is a good idea. Unless you actually know the distances (in basepairs) between the contigs, this could lead to many spurious alignments. I think you should leave them separate in your fasta file for both repeatmodeler, ab-initio training and running maker. If you?re worried about short contigs in your assembly, you can exclude shorter contigs with the min_contig option in the maker_opts control file. > > ~Daniel > > >> On Feb 28, 2017, at 2:43 AM, dcg at cau.edu.cn wrote: >> >> Dear sir: >> After assemblying, I got many contigs and their order in each chromosome. >> What I have done is merging these contigs into each chromosomes followed by the order, with 100 Ns inserted betwwen each contigs. So that I got chr1 chr2......Then I ran the repeatmodeler, predictor to annotate it. >> >> Could my way reach a high-quality result? Should I use all the contigs to mask repeats and practice predictor? >> Is there any better way to do genome-wide annotation? >> >> I'm looking forward to your reply! >> Best wishes! >> >> Chao Chao >> 2017.02.28 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Mon Mar 6 02:21:20 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Mon, 6 Mar 2017 09:21:20 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: Hi Carson. Again thank you for your response. But - sorry to say - it's not possible our MPI is corrupt. We have approx. 1.500 users working on our bwUniCluster so far. 95 % of these users use MPI. And: All our other software (see: cis-hpc.uni-konstanz.de ) is running with our implementations of IMPI/OMPI without any issues. :-() Am 02.03.2017 um 18:41 schrieb Carson Holt: > This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello > > All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. > > The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). > > > So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. > > Thanks, > Carson > > > >> On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: >> >> Hi Carson! >> >> Am 02.03.2017 um 01:43 schrieb Carson Holt: >>> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >> Same error(s). >> >>> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. >> Yes, they fail. >> >>> The following may or may not work if the first two fail: >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello >> WORKS FINE! >> >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >> WORKS! >> >>> Also send me this file ?> perl/lib/MAKER/ConfigData.pm >> Attached to this mail. >> >>> Thanks, >>> Carson >> >> -- >> Rainer Rutka >> University of Konstanz >> Communication, Information, Media Centre (KIM) >> * High-Performance-Computing (HPC) >> * KIM-Support and -Base-Services >> Room: V511 >> 78457 Konstanz, Germany >> +49 7531 88-5413 >> > -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Mon Mar 6 08:47:51 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 6 Mar 2017 07:47:51 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: <9B00FB6A-B5F5-4240-AB1E-4CBEEEB63C7F@gmail.com> I was able to replicate the error as so ?> 1. Intel MPI installed on CentOS kernel 6 (MPI works fine) 2. Upgrade to kernel 7 without reinstalling and Intel MPI reports the same error as reported by the user. 3. After recompiling Intel MPI on kernel 7 the error goes away. The proof that there is an issue with your Intel MPI installation is in this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello That command is simply trying to get mpiexec to launch ?echo Hello? internally. And it failed. It?s as simple as that. Thanks, Carson > On Mar 6, 2017, at 1:21 AM, Rainer Rutka wrote: > > > Hi Carson. > > Again thank you for your response. > > But - sorry to say - it's not possible our MPI is corrupt. > We have approx. 1.500 users working on our bwUniCluster so far. 95 % > of these users use MPI. And: All our other software (see: > > cis-hpc.uni-konstanz.de ) > > is running with our implementations of IMPI/OMPI without any > issues. > > :-() > > > Am 02.03.2017 um 18:41 schrieb Carson Holt: >> This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >> >> All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. >> >> The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). >> >> >> So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. >> >> Thanks, >> Carson >> >> >> >>> On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: >>> >>> Hi Carson! >>> >>> Am 02.03.2017 um 01:43 schrieb Carson Holt: >>>> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >>> Same error(s). >>> >>>> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. >>> Yes, they fail. >>> >>>> The following may or may not work if the first two fail: >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello >>> WORKS FINE! >>> >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >>> WORKS! >>> >>>> Also send me this file ?> perl/lib/MAKER/ConfigData.pm >>> Attached to this mail. >>> >>>> Thanks, >>>> Carson >>> >>> -- >>> Rainer Rutka >>> University of Konstanz >>> Communication, Information, Media Centre (KIM) >>> * High-Performance-Computing (HPC) >>> * KIM-Support and -Base-Services >>> Room: V511 >>> 78457 Konstanz, Germany >>> +49 7531 88-5413 >>> >> > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From dussert.yann at gmail.com Mon Mar 6 10:51:59 2017 From: dussert.yann at gmail.com (YannDussert) Date: Mon, 6 Mar 2017 17:51:59 +0100 Subject: [maker-devel] Differences in non_overlapping protein file between runs Message-ID: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Hello, First, thank you for developing MAKER, this is a great annotation tool! I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? Thank you in advance for your answer. Best regards, Yann From dcg at cau.edu.cn Sun Mar 5 05:26:59 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Sun, 5 Mar 2017 19:26:59 +0800 Subject: [maker-devel] For help about masking repeats before annotation Message-ID: <2017030519265949065818@cau.edu.cn> Dear sir: Before the maker opeations, I do repeat masking first on my contigs. However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : NOTE: RepeatScout did not return any models. Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? Hope to hear from you. Best wishes! Chao Chao 2017.03.05 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Mon Mar 6 06:24:17 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Mon, 6 Mar 2017 20:24:17 +0800 Subject: [maker-devel] How to merge the annotation results into chromosomes? Message-ID: <2017030620241723514513@cau.edu.cn> Dear sir: Hello, I am doing my utmost to sdudy on annotation now. However, I have been confused on results handlng recently. After alignment, practice and curation, we can get good gene model and merge them by gff_merge and fasta_merge. But how can I merge them into different chromosomes like Homo_sapiens.GRCh38.87.chromosome.11.gff3.gz? I don't just want results of different contigs. I'm looking forward to your reply. Thanks a lot! Best wishes! Chao Chao 2017.03.06 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucys-world at mailbox.org Mon Mar 6 08:40:33 2017 From: lucys-world at mailbox.org (lucys-world at mailbox.org) Date: Mon, 6 Mar 2017 15:40:33 +0100 (CET) Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP Message-ID: <850873370.6534.1488811234072@office.mailbox.org> Dear maker-devel group, I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. after two maker runs I wanted to create a new HMM for the third maker run, but the command fathom genome.ann genoma.dna -gene-stats resulted in 0 genes. What have I done so far: * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 * I was able to create an HMM based on all merged *.gff But these were not many: o out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: o 280 sequences 0.458676 avg GC fraction (min=0.338014 max=0.708052) 7445 genes (plus=3192 minus=4253) 1621 (0.217730) single-exon 5824 (0.782270) multi-exon 168.412018 mean exon (min=1 max=5224) 1464.349243 mean intron (min=30 max=41197) * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. o the gene-stats for the output of the second maker run are: o 282 sequences 0.473125 avg GC fraction (min=0.338014 max=0.725131) 0 genes (plus=0 minus=0) 0 (-nan) single-exon 0 (-nan) multi-exon -nan mean exon (min=2147483647 max=0) -nan mean intron (min=2147483647 max=0) Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) Thank you for your time and help kind regards Lucy -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Mon Mar 6 11:11:57 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Mon, 6 Mar 2017 17:11:57 +0000 Subject: [maker-devel] How to merge the annotation results into chromosomes? In-Reply-To: <2017030620241723514513@cau.edu.cn> References: <2017030620241723514513@cau.edu.cn> Message-ID: <45D1390D-212D-42A4-9819-C0045601B013@mail.ufl.edu> Hi, Do you have data that can precisely place each of your contigs in their position on the chromosome? Without that, this isn?t even possible, since a gff3 file with the chromosomes instead of the contigs requires each contig?s position in the chromosome. And in any case, I don?t think there is a script in the maker tools that does what you?re asking. Maybe someone else has made a script to do that. ~Daniel On Mar 6, 2017, at 7:24 AM, dcg at cau.edu.cn wrote: Dear sir: Hello, I am doing my utmost to sdudy on annotation now. However, I have been confused on results handlng recently. After alignment, practice and curation, we can get good gene model and merge them by gff_merge and fasta_merge. But how can I merge them into different chromosomes like Homo_sapiens.GRCh38.87.chromosome.11.gff3.gz? I don't just want results of different contigs. I'm looking forward to your reply. Thanks a lot! Best wishes! Chao Chao ________________________________ 2017.03.06 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Mon Mar 6 11:15:07 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Mon, 6 Mar 2017 17:15:07 +0000 Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <850873370.6534.1488811234072@office.mailbox.org> References: <850873370.6534.1488811234072@office.mailbox.org> Message-ID: <970801D9-536E-494C-B5C7-F5F72125FAFC@mail.ufl.edu> Hi Lucy, What were your settings for the second training run? Did you leave protein2genome=1? ~Daniel On Mar 6, 2017, at 9:40 AM, lucys-world at mailbox.org wrote: Dear maker-devel group, I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. after two maker runs I wanted to create a new HMM for the third maker run, but the command fathom genome.ann genoma.dna -gene-stats resulted in 0 genes. What have I done so far: * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 * I was able to create an HMM based on all merged *.gff But these were not many: * out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: * 280 sequences 0.458676 avg GC fraction (min=0.338014 max=0.708052) 7445 genes (plus=3192 minus=4253) 1621 (0.217730) single-exon 5824 (0.782270) multi-exon 168.412018 mean exon (min=1 max=5224) 1464.349243 mean intron (min=30 max=41197) * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. * the gene-stats for the output of the second maker run are: * 282 sequences 0.473125 avg GC fraction (min=0.338014 max=0.725131) 0 genes (plus=0 minus=0) 0 (-nan) single-exon 0 (-nan) multi-exon -nan mean exon (min=2147483647 max=0) -nan mean intron (min=2147483647 max=0) Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) Thank you for your time and help kind regards Lucy _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 6 13:48:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 6 Mar 2017 12:48:49 -0700 Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <850873370.6534.1488811234072@office.mailbox.org> References: <850873370.6534.1488811234072@office.mailbox.org> Message-ID: <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> It looks like you have no genes to train with. So you did something wrong on your second run. Either no gene predictor was running or you provided no evidence for the predictor, so you produced no models. ?Carson > On Mar 6, 2017, at 7:40 AM, lucys-world at mailbox.org wrote: > > Dear maker-devel group, > > > > I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. > > after two maker runs I wanted to create a new HMM for the third maker run, but the command > > > > fathom genome.ann genoma.dna -gene-stats > > > > resulted in 0 genes. > > > > What have I done so far: > > for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 > > > I was able to create an HMM based on all merged *.gff But these were not many: > out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: > 280 sequences > 0.458676 avg GC fraction (min=0.338014 max=0.708052) > 7445 genes (plus=3192 minus=4253) > 1621 (0.217730) single-exon > 5824 (0.782270) multi-exon > 168.412018 mean exon (min=1 max=5224) > 1464.349243 mean intron (min=30 max=41197) > > > For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. > the gene-stats for the output of the second maker run are: > 282 sequences > 0.473125 avg GC fraction (min=0.338014 max=0.725131) > 0 genes (plus=0 minus=0) > 0 (-nan) single-exon > 0 (-nan) multi-exon > -nan mean exon (min=2147483647 max=0) > -nan mean intron (min=2147483647 max=0) > > > Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) > > > > Thank you for your time and help > > kind regards > > Lucy > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Mar 7 09:14:11 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 7 Mar 2017 10:14:11 -0500 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Hi Carson: I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. Thanks 2017-03-01 16:10 GMT-05:00 Carson Holt : > That will work. > > ?Carson > > On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: > > Thank you. I have submit my jobs to our server. What I plan to do is like > this: (1) split contigs into 50 files; (2) for each contig file, I > collected the annotation into gff and protein sequences into fasta format; > (3) manually merge the 50 gff files and protein sequences files. Is what I > am doing also correct? > > Best > Quanwei > > 2017-03-01 15:54 GMT-05:00 Carson Holt : > >> If you split into separate files, you can use the -g option to select the >> input file together with the -base option so all output goes to the same >> directory. Because they technically have different input files, this will >> avoid file locking issues. You have to use the -dsindex option at the end >> to rebuild the datastore index, so it looks like a single job. But that is >> one way to get around the issue. >> >> ?Carson >> >> >> >> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >> >> Thank you. But I met some problems with MPI on our server. So now I >> split my contigs into several files and annotate those files separately. >> After I finish the annotation on each file, I will merge the results. >> >> Thank you for your explanation! >> >> Best >> Quanwei >> >> 2017-03-01 15:36 GMT-05:00 Carson Holt : >> >>> If you submit too many simultaneous, MAKER run then file locks will >>> start to collide and one run will slow down the others. You should submit >>> fewer simultaneous jobs and instead use MPI (maker must be configured and >>> compiled to use MPI). >>> >>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>> mpiexec -n 200 maker 2> maker_mpi1.error >>> >>> ?Carson >>> >>> >>> >>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang >>> wrote: >>> > >>> > Hello: >>> > >>> > I am doing genome annotation using Maker on our high performance >>> computational cluster (HPC). Due to some issues of MPI, I submitted the >>> Maker jobs several times under the same directory to HPC. Followed by the >>> example in the protocol (as shown below), when I submit the jobs I make >>> them as background processes by "&" except the first one. Is this necessary >>> when I submit a job to a HPC? I found it costed much much longer time than >>> I expected (according to a testing on a smaller data set). I am not sure >>> whether setting the process as background process lead to this issue? >>> > >>> > The example in the protocol >>> > % maker 2> maker1.error >>> > % maker 2> maker2.error & >>> > % maker 2> maker3.error & >>> > ...... >>> > >>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 >>> of the time that cost for annotation a 50000bp contig? I am using SNAP for >>> an inito and RNA-seq assembly and protein sequences as evidence. I have >>> more than half contigs shorter than 300bp (whose total length is only about >>> 5% of the total length of all contigs), I want to know whether I can save >>> about half (or only about 5%) of the time if I ignore those short contigs. >>> > >>> > Thanks >>> > >>> > Best >>> > Quanwei >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>> ell-lab.org >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 7 09:35:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 7 Mar 2017 08:35:42 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Use gff3_merge again without the -d option. Just give it all 50 files. --Carson Sent from my iPhone > On Mar 7, 2017, at 8:14 AM, Quanwei Zhang wrote: > > Hi Carson: > > I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? > (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. > (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. > > Thanks > > > > 2017-03-01 16:10 GMT-05:00 Carson Holt : >> That will work. >> >> ?Carson >> >>> On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: >>> >>> Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? >>> >>> Best >>> Quanwei >>> >>> 2017-03-01 15:54 GMT-05:00 Carson Holt : >>>> If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. >>>> >>>> ?Carson >>>> >>>> >>>> >>>>> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >>>>> >>>>> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >>>>> >>>>> Thank you for your explanation! >>>>> >>>>> Best >>>>> Quanwei >>>>> >>>>> 2017-03-01 15:36 GMT-05:00 Carson Holt : >>>>>> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >>>>>> >>>>>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>>>>> mpiexec -n 200 maker 2> maker_mpi1.error >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: >>>>>> > >>>>>> > Hello: >>>>>> > >>>>>> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >>>>>> > >>>>>> > The example in the protocol >>>>>> > % maker 2> maker1.error >>>>>> > % maker 2> maker2.error & >>>>>> > % maker 2> maker3.error & >>>>>> > ...... >>>>>> > >>>>>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >>>>>> > >>>>>> > Thanks >>>>>> > >>>>>> > Best >>>>>> > Quanwei >>>>>> > _______________________________________________ >>>>>> > maker-devel mailing list >>>>>> > maker-devel at box290.bluehost.com >>>>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 7 09:35:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 7 Mar 2017 08:35:42 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Use gff3_merge again without the -d option. Just give it all 50 files. --Carson Sent from my iPhone > On Mar 7, 2017, at 8:14 AM, Quanwei Zhang wrote: > > Hi Carson: > > I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? > (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. > (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. > > Thanks > > > > 2017-03-01 16:10 GMT-05:00 Carson Holt : >> That will work. >> >> ?Carson >> >>> On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: >>> >>> Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? >>> >>> Best >>> Quanwei >>> >>> 2017-03-01 15:54 GMT-05:00 Carson Holt : >>>> If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. >>>> >>>> ?Carson >>>> >>>> >>>> >>>>> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >>>>> >>>>> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >>>>> >>>>> Thank you for your explanation! >>>>> >>>>> Best >>>>> Quanwei >>>>> >>>>> 2017-03-01 15:36 GMT-05:00 Carson Holt : >>>>>> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >>>>>> >>>>>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>>>>> mpiexec -n 200 maker 2> maker_mpi1.error >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: >>>>>> > >>>>>> > Hello: >>>>>> > >>>>>> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >>>>>> > >>>>>> > The example in the protocol >>>>>> > % maker 2> maker1.error >>>>>> > % maker 2> maker2.error & >>>>>> > % maker 2> maker3.error & >>>>>> > ...... >>>>>> > >>>>>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >>>>>> > >>>>>> > Thanks >>>>>> > >>>>>> > Best >>>>>> > Quanwei >>>>>> > _______________________________________________ >>>>>> > maker-devel mailing list >>>>>> > maker-devel at box290.bluehost.com >>>>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisi.hahni at gmail.com Tue Mar 7 18:51:00 2017 From: chrisi.hahni at gmail.com (Christoph Hahn) Date: Wed, 8 Mar 2017 01:51:00 +0100 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> Message-ID: <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> Hi MAKER community, I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! cheers, Christoph On 10/02/2015 17:56, Carson Holt wrote: > I ran a few est2genome runs with a cufflinks file i just generated and > did not get any issues for EST based gene models. > > I?d like to at least have your test set to see if I can duplicate what > you are seeing. > > Use this to upload the job files then I can just run it from my server > here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > ?Carson > > >> On Feb 3, 2015, at 11:13 AM, Jason Gallant > > wrote: >> >> Hi Folks, >> >> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been >> checking the output files, and have noticed that none of my RNAseq >> data was incorporated on the run. I used Cufflinks to perform >> alignments of libraries from several tissues, ran the accessory >> script cufflinks2gff3 for each tissue, then concatenated the >> resulting gff3 files. I even ran the accessory script gff3merge to >> check that the resulting file was properly formatted. >> >> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only >> get protein2genome and repeatmasker evidence in my resulting maker >> gff3 file, and the genes predicted by these. Is there another option >> that I need to enable in order to use my est_gff file? I?m trying to >> get a set of genes to train the predictors for my next step. >> >> Any help would (as always) be greatly appreciated! >> >> Best, >> Jason Gallant >> >> ? >> Dr. Jason R. Gallant >> Assistant Professor >> Room 38 Natural Sciences >> Department of Zoology >> Michigan State University >> East Lansing, MI 48824 >> jgallant at msu.edu >> office: 517-884-7756 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Thu Mar 9 03:36:27 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 09:36:27 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing Message-ID: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> Hi all, I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. Thank you. Sincerely, Ole From carsonhh at gmail.com Thu Mar 9 11:52:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 10:52:30 -0700 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Message-ID: My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. ?Carson > On Mar 6, 2017, at 9:51 AM, YannDussert wrote: > > Hello, > > First, thank you for developing MAKER, this is a great annotation tool! > > I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. > > After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? > > Thank you in advance for your answer. > > Best regards, > Yann > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 9 12:39:11 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 11:39:11 -0700 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> Message-ID: <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> Jason never responded back to this one or uploaded his file to test. He probably figured it out off list. My guess is that your results are too fragmented to build a model that can pass filtering thresholds with. If you want I can take a look. You can upload all files for a test job here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi ?Carson > On Mar 7, 2017, at 5:51 PM, Christoph Hahn wrote: > > Hi MAKER community, > > I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. > > I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! > > cheers, > Christoph > > On 10/02/2015 17:56, Carson Holt wrote: >> I ran a few est2genome runs with a cufflinks file i just generated and did not get any issues for EST based gene models. >> >> I?d like to at least have your test set to see if I can duplicate what you are seeing. >> >> Use this to upload the job files then I can just run it from my server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> ?Carson >> >> >>> On Feb 3, 2015, at 11:13 AM, Jason Gallant > wrote: >>> >>> Hi Folks, >>> >>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been checking the output files, and have noticed that none of my RNAseq data was incorporated on the run. I used Cufflinks to perform alignments of libraries from several tissues, ran the accessory script cufflinks2gff3 for each tissue, then concatenated the resulting gff3 files. I even ran the accessory script gff3merge to check that the resulting file was properly formatted. >>> >>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only get protein2genome and repeatmasker evidence in my resulting maker gff3 file, and the genes predicted by these. Is there another option that I need to enable in order to use my est_gff file? I?m trying to get a set of genes to train the predictors for my next step. >>> >>> Any help would (as always) be greatly appreciated! >>> >>> Best, >>> Jason Gallant >>> >>> ? >>> Dr. Jason R. Gallant >>> Assistant Professor >>> Room 38 Natural Sciences >>> Department of Zoology >>> Michigan State University >>> East Lansing, MI 48824 >>> jgallant at msu.edu >>> office: 517-884-7756 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 9 12:51:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 11:51:25 -0700 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> Message-ID: <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure ?Carson > On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: > > Hi all, > I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. > > First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? > > I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. > > Thank you. > > Sincerely, > Ole > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucys-world at mailbox.org Tue Mar 7 02:39:40 2017 From: lucys-world at mailbox.org (lucys-world at mailbox.org) Date: Tue, 7 Mar 2017 09:39:40 +0100 (CET) Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> References: <850873370.6534.1488811234072@office.mailbox.org> <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> Message-ID: <1407048207.7112.1488875981292@office.mailbox.org> Hallo Carson, hello Daniel, thank you for your fast reply and help. To Daniels question: Yes unfortunately I had protein2genome=1 in all runs To Carson: After reading a lot through the forum I figured that I had a mistake in understanding an initio gene prediction. I thought one had to perform 3 maker run in total. One training run and then two maker runs for annotation. But now I think there are only two maker in to perform in total (one training and then one annotation run) is that correct? So after my first run I created an HMM based on the first gene-stats (with 7445 genes) and performed my second run with this HMM. Then I tried to create a new HMM based on my second run output. I think that is not necessary since the output of the second run should be my annotated genome? I think I have to redo my maker runs and for that have to questions regarding the maker_opts.ctl: 1. Training run: For that I have to give maker my genome, my evidence (in my Case Busco and Swissport data sets) and set protein2genome=1 . Since that is my only evidence I don't change anything else? (I don't add anything in the gene prediction paragraph?) 2. Annotation run: With the gff output of the training run I create my own HMM from SNAP. In the maker_opts.ctl I then add for this annotation run my SNAP-HMM and set AugustusSpecies on the closest related species (as recommended in the Augustus manual), is that correct? Do I give also my Protein evidence as I did in the Trainingsrun? Thank you very much for your time and help with that ! - Lucy > Carson Holt hat am 6. M?rz 2017 um 20:48 geschrieben: > > It looks like you have no genes to train with. So you did something wrong on your second run. Either no gene predictor was running or you provided no evidence for the predictor, so you produced no models. > > ?Carson > > > > > > On Mar 6, 2017, at 7:40 AM, lucys-world at mailbox.org mailto:lucys-world at mailbox.org wrote: > > > > > > Dear maker-devel group, > > > > > > I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. > > > > after two maker runs I wanted to create a new HMM for the third maker run, but the command > > > > > > fathom genome.ann genoma.dna -gene-stats > > > > > > resulted in 0 genes. > > > > > > What have I done so far: > > > > * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 > > > > > > * I was able to create an HMM based on all merged *.gff But these were not many: > > o out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: > > o 280 sequences > > 0.458676 avg GC fraction (min=0.338014 max=0.708052) > > 7445 genes (plus=3192 minus=4253) > > 1621 (0.217730) single-exon > > 5824 (0.782270) multi-exon > > 168.412018 mean exon (min=1 max=5224) > > 1464.349243 mean intron (min=30 max=41197) > > > > > > * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. > > o the gene-stats for the output of the second maker run are: > > o 282 sequences > > 0.473125 avg GC fraction (min=0.338014 max=0.725131) > > 0 genes (plus=0 minus=0) > > 0 (-nan) single-exon > > 0 (-nan) multi-exon > > -nan mean exon (min=2147483647 max=0) > > -nan mean intron (min=2147483647 max=0) > > > > > > Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) > > > > > > Thank you for your time and help > > > > kind regards > > > > Lucy > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com mailto:maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Thu Mar 9 13:42:31 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:42:31 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> Message-ID: <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> Hi Carson. In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? So I guess this version is not available, or maybe they meant 3.0beta1 or something. ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. Thank you. Ole > On 09 Mar 2017, at 19:51, Carson Holt wrote: > > Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. > > It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). > > I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. > > It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure > > ?Carson > > > >> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >> >> Hi all, >> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >> >> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >> >> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >> >> Thank you. >> >> Sincerely, >> Ole >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Thu Mar 9 13:50:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 12:50:10 -0700 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> Message-ID: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. ?Carson > On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: > > Hi Carson. > > In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? > > So I guess this version is not available, or maybe they meant 3.0beta1 or something. > > ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. > > Thank you. > > Ole > >> On 09 Mar 2017, at 19:51, Carson Holt wrote: >> >> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >> >> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >> >> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >> >> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >> >> ?Carson >> >> >> >>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>> >>> Hi all, >>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>> >>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>> >>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>> >>> Thank you. >>> >>> Sincerely, >>> Ole >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From o.k.torresen at ibv.uio.no Thu Mar 9 13:55:00 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:55:00 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> Message-ID: Ah, thank you. That explains it. Ole > On 09 Mar 2017, at 20:50, Carson Holt wrote: > > My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. > > ?Carson > > >> On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: >> >> Hi Carson. >> >> In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? >> >> So I guess this version is not available, or maybe they meant 3.0beta1 or something. >> >> ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. >> >> Thank you. >> >> Ole >> >>> On 09 Mar 2017, at 19:51, Carson Holt wrote: >>> >>> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >>> >>> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >>> >>> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >>> >>> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >>> >>> ?Carson >>> >>> >>> >>>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>>> >>>> Hi all, >>>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>>> >>>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>>> >>>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> Ole >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From o.k.torresen at ibv.uio.no Thu Mar 9 13:59:35 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:59:35 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> Message-ID: <0B73432A-E0EE-4983-8314-E8A94AADA74F@ibv.uio.no> Ah, thank you. That explains it. Ole > On 09 Mar 2017, at 20:50, Carson Holt wrote: > > My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. > > ?Carson > > >> On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: >> >> Hi Carson. >> >> In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? >> >> So I guess this version is not available, or maybe they meant 3.0beta1 or something. >> >> ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. >> >> Thank you. >> >> Ole >> >>> On 09 Mar 2017, at 19:51, Carson Holt wrote: >>> >>> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >>> >>> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >>> >>> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >>> >>> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >>> >>> ?Carson >>> >>> >>> >>>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>>> >>>> Hi all, >>>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>>> >>>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>>> >>>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> Ole >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From chrisi.hahni at gmail.com Fri Mar 10 02:50:52 2017 From: chrisi.hahni at gmail.com (Christoph Hahn) Date: Fri, 10 Mar 2017 09:50:52 +0100 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> Message-ID: <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> Dear Carson, Thanks for getting in touch! I actually managed in the end. I converted the gtf I had from cufflinks to gff3 via the script 'gtf2gff.pl' from augustus and then used the script 'gffGetmRNA.pl' again from augustus to extract the mRNA in fasta. This file I fed to MAKER via the 'est=' route and now I get plenty of est2genome evidence in the maker result. So the problem seems to be limited to the route 'est_gff=', allthough there is no error message whatsoever the est2genome routine seems to never be triggered. I'd still be happy to upload my data (the cufflinks gff, the genome fasta, anything else?) if you want to try to reproduce the problem. Let me know! btw I seem to be unable to create a new topic or respond to topics via google groups. Is the list closed or the access restricted somehow. I only managed by responding to Jason's mail which I still had in my inbox directly via my gmail. Thanks! cheers, Christoph On 09/03/2017 19:39, Carson Holt wrote: > Jason never responded back to this one or uploaded his file to test. > He probably figured it out off list. My guess is that your results are > too fragmented to build a model that can pass filtering thresholds with. > > If you want I can take a look. You can upload all files for a test job > here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > ?Carson > > > >> On Mar 7, 2017, at 5:51 PM, Christoph Hahn > > wrote: >> >> Hi MAKER community, >> >> I think I am seeing the same issue that Jason has reported. ran >> cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER >> via 'est_gff=' with 'est2genome=1'. In the resulting gff file from >> maker I only get protein2genome and repeatmasker evidence. If I do a >> search in the maker log est2genome never comes up. Tried to extract >> the cufflinks results as fasta and feed to MAKER via 'est='. Still no >> indication that the evidence is used. >> >> I am using MAKER 2.31.8. Any help would be much appreciated! Thanks >> in advance for your time! >> >> cheers, >> Christoph >> >> On 10/02/2015 17:56, Carson Holt wrote: >>> I ran a few est2genome runs with a cufflinks file i just generated >>> and did not get any issues for EST based gene models. >>> >>> I?d like to at least have your test set to see if I can duplicate >>> what you are seeing. >>> >>> Use this to upload the job files then I can just run it from my >>> server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> ?Carson >>> >>> >>>> On Feb 3, 2015, at 11:13 AM, Jason Gallant >>> > wrote: >>>> >>>> Hi Folks, >>>> >>>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been >>>> checking the output files, and have noticed that none of my RNAseq >>>> data was incorporated on the run. I used Cufflinks to perform >>>> alignments of libraries from several tissues, ran the accessory >>>> script cufflinks2gff3 for each tissue, then concatenated the >>>> resulting gff3 files. I even ran the accessory script gff3merge to >>>> check that the resulting file was properly formatted. >>>> >>>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only >>>> get protein2genome and repeatmasker evidence in my resulting maker >>>> gff3 file, and the genes predicted by these. Is there another >>>> option that I need to enable in order to use my est_gff file? I?m >>>> trying to get a set of genes to train the predictors for my next step. >>>> >>>> Any help would (as always) be greatly appreciated! >>>> >>>> Best, >>>> Jason Gallant >>>> >>>> ? >>>> Dr. Jason R. Gallant >>>> Assistant Professor >>>> Room 38 Natural Sciences >>>> Department of Zoology >>>> Michigan State University >>>> East Lansing, MI 48824 >>>> jgallant at msu.edu >>>> office: 517-884-7756 >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dussert.yann at gmail.com Fri Mar 10 04:53:36 2017 From: dussert.yann at gmail.com (YannDussert) Date: Fri, 10 Mar 2017 11:53:36 +0100 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Message-ID: <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> Hi, Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round. I can't see any problem with it, it looks like this: Plvit001 augustus_masked match 66626 70338 0.85 + . ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 Plvit001 augustus_masked match_part 66626 67586 0.85 + . ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961 Plvit001 augustus match 66626 70338 1 + . ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1 Plvit001 augustus match_part 66626 70096 1 + . ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471 Plvit001 augustus_masked match_part 68166 68486 0.85 + . ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321 Plvit001 augustus_masked match_part 69504 70096 0.85 + . ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593 Plvit001 augustus_masked match_part 70174 70338 0.85 + . ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165 Best regards, Yann On 09/03/2017 18:52, Carson Holt wrote: > My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. > > ?Carson > > >> On Mar 6, 2017, at 9:51 AM, YannDussert wrote: >> >> Hello, >> >> First, thank you for developing MAKER, this is a great annotation tool! >> >> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. >> >> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? >> >> Thank you in advance for your answer. >> >> Best regards, >> Yann >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ereboperezsilva at gmail.com Fri Mar 10 05:05:29 2017 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Fri, 10 Mar 2017 12:05:29 +0100 Subject: [maker-devel] ERROR: Chunk failed Message-ID: Hi! I'm having some trouble understanding the ERROR I'm receiving. Recently I've set up a new machine to work annotate a genome (around 2 Gb big) using Maker. We mounted a new disk of 1Tb and loaded there the files of a uncomplete run of annotation (we started it in a different machine and move it to this one, which had more precessing power). Apparently everything was ok, until somewhen yesterday we received the next ERROR: examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig4633 We are running 16 jobs of maker at the same time, on the unsplitted genome. We checked and "df" command returned that only 7% os the mounted disk was used. So the space does not appear to be the problem... Why that error then? Thanks for the help. Jos? Mar?a Gonz?lez P?rez-Silva. PhD student at Universidad de Oviedo. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ereboperezsilva at gmail.com Fri Mar 10 11:21:38 2017 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Fri, 10 Mar 2017 18:21:38 +0100 Subject: [maker-devel] Maker ERROR Message-ID: Hi, I wrote early this day, in reference to a problem of (apparently) space. After I deleted some unnecesary files (despite having plenty of storage left), I killed all the processes, and set 'clean_try=1' as recomended in this post . Before re-running the processes, we checked that there were no limitation over the size of a directory or something similar. After re-running, at first, all seemed correct, but when I re-checked some time after, I found out a lot of contigs with the status FAILED without folder specification in the '_master_datastore_index.log', looking like: Contig480 FAILED > Contig496 FAILED Contig512 FAILED Contig528 FAILED Contig544 FAILED Contig560 FAILED? But checking the 'nohub.out' of every proccess (16 in total, as the machine has 16 cores), I notice that each run is, from time to time, processing the contig correctly. So, after several (a lot) of FAILED contigs, it process one correctly. As said in the previous email, the ERROR dispolayed in the nohup.out is (including the last part of a processed contig at the beguinning): ? > #--------- command -------------# Widget::blastx: /usr/bin/blastall -p blastx -d > /data/ge/tmp/maker_VfDQQU/hsap_ensembl%2Efa.mpi.10.6 -i > /data/ge/tmp/maker_VfDQQU/0/Contig20.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y > 500000000 -a 4 -U -F T -I T -o > /data/ge/round3/cg.maker.output/cg_datastore/56/AC/Contig20//theVoid.Contig20/0/Contig20.0.hsap_ensembl%2Efa.blastx.temp_dir/hsap_ensembl%2Efa.mpi.10.6.blastx #-------------------------------# deleted:511 hits doing blastx of proteins open3: fork failed: Cannot allocate memory at > /home/jmgps/software/maker/bin/../lib/File/NFSLock.pm line 1037. --> rank=NA, hostname=Planarian2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig20 > ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig20 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig22 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig24 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig26 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig28? I'm totally lost here, I think it is still processing contigs, but the FAILED attemps slow down the whole process, and we are in a hurry due to the maintenance of the machine. And I can't understand the source of the ERROR. I will be more than happy to provide more details about the problem, if requested. Thanks a lot for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 10 11:34:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 Mar 2017 10:34:34 -0700 Subject: [maker-devel] Maker ERROR In-Reply-To: References: Message-ID: Several things. 1. MAKER does a lot of it?s work in a temporary directory (usually /tmp). This directory must be locally mounted and cannot be a network mounted location. If this location is full you can get issues. 2. MAKER needs at least 1GB of RAM per process (2-3GB is safer), so if you don?t have enough RAM you may need to run fewer processes (with MPI multiply whatever you supplied to the mpiexec -n flag by 1GB). 3. If you are launching MAKER multiple times as opposed to launching once via MPI, you will exacerbate the above limitations as well as open up IO limitations. MAKER can and does saturate IO when run multiple times simultaneously (this is especially true for network mounted locations). If you run via MPI you can greatly reduce IO, so make sure you are using MPI and not just launching MAKER multiple times. If you absolutely have to start multiple jobs, you can reduce IO somewhat by splitting the input fasta into pieces (use fasta_tool). Give a separate piece to each job via maker?s -g flag, and set -base so all results from all jobs get written to the same location. Then each job can avoid multiple file locks that would have been encountered by sharing input. Note that you must rebuild the datastore index using 'maker -dsindex? when all jobs complete. ?Carson > On Mar 10, 2017, at 10:21 AM, Jos? M? G. Perez-Silva wrote: > > Hi, > > I wrote early this day, in reference to a problem of (apparently) space. After I deleted some unnecesary files (despite having plenty of storage left), I killed all the processes, and set 'clean_try=1' as recomended in this post . Before re-running the processes, we checked that there were no limitation over the size of a directory or something similar. > > After re-running, at first, all seemed correct, but when I re-checked some time after, I found out a lot of contigs with the status FAILED without folder specification in the '_master_datastore_index.log', looking like: > > Contig480 FAILED > Contig496 FAILED > Contig512 FAILED > Contig528 FAILED > Contig544 FAILED > Contig560 FAILED? > > But checking the 'nohub.out' of every proccess (16 in total, as the machine has 16 cores), I notice that each run is, from time to time, processing the contig correctly. So, after several (a lot) of FAILED contigs, it process one correctly. As said in the previous email, the ERROR dispolayed in the nohup.out is (including the last part of a processed contig at the beguinning): > > ?#--------- command -------------# > Widget::blastx: > /usr/bin/blastall -p blastx -d /data/ge/tmp/maker_VfDQQU/hsap_ensembl%2Efa.mpi.10.6 -i /data/ge/tmp/maker_VfDQQU/0/Contig20.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 4 -U -F T -I T -o /data/ge/round3/cg.maker.output/cg_datastore/56/AC/Contig20//theVoid.Contig20/0/Contig20.0.hsap_ensembl%2Efa.blastx.temp_dir/hsap_ensembl%2Efa.mpi.10.6.blastx > #-------------------------------# > deleted:511 hits > doing blastx of proteins > open3: fork failed: Cannot allocate memory at /home/jmgps/software/maker/bin/../lib/File/NFSLock.pm line 1037. > --> rank=NA, hostname=Planarian2 > ERROR: Failed while doing blastx of proteins > ERROR: Chunk failed at level:8, tier_type:3 > FAILED CONTIG:Contig20 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:Contig20 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig22 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig24 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig26 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig28? > > I'm totally lost here, I think it is still processing contigs, but the FAILED attemps slow down the whole process, and we are in a hurry due to the maintenance of the machine. And I can't understand the source of the ERROR. > > I will be more than happy to provide more details about the problem, if requested. > > Thanks a lot for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 11:16:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 10:16:25 -0600 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> Message-ID: <9EC90572-7E3F-4B07-9098-6CAFD7B3A4B0@gmail.com> I see you have both masked and unmasked augustus calls, so you may have a lot of non-masked predictions in your second run that are entirely contained in transposons and repeat regions (that is why they do not overlap). Really the easiest thing to do would be to open the results in a browser, find one of the ones listed as non-overlapping, and then look at it to see why it is not overlapping. You can then look at that specific location directly in the file as needed, but it will be much easier to interpret looking at the features drawn in a browser (like Apollo - desktop version). ?Carson > On Mar 10, 2017, at 3:53 AM, YannDussert wrote: > > Hi, > > Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round. > > I can't see any problem with it, it looks like this: > > Plvit001 augustus_masked match 66626 70338 0.85 + . ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 > Plvit001 augustus_masked match_part 66626 67586 0.85 + . ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961 > Plvit001 augustus match 66626 70338 1 + . ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1 > Plvit001 augustus match_part 66626 70096 1 + . ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471 > Plvit001 augustus_masked match_part 68166 68486 0.85 + . ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321 > Plvit001 augustus_masked match_part 69504 70096 0.85 + . ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593 > Plvit001 augustus_masked match_part 70174 70338 0.85 + . ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165 > > > Best regards, > > Yann > > On 09/03/2017 18:52, Carson Holt wrote: >> My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. >> >> ?Carson >> >> >>> On Mar 6, 2017, at 9:51 AM, YannDussert wrote: >>> >>> Hello, >>> >>> First, thank you for developing MAKER, this is a great annotation tool! >>> >>> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. >>> >>> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? >>> >>> Thank you in advance for your answer. >>> >>> Best regards, >>> Yann >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 11:17:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 10:17:58 -0600 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> Message-ID: Sure. Send me the file. On a side note, I find cufflinks results to be very noisy (lot?s of false positives). I usually get better results using assembled reads from Trinity (with -jaccard_clip option set), or using Stringtie. Thanks, Carson > On Mar 10, 2017, at 1:50 AM, Christoph Hahn wrote: > > Dear Carson, > > Thanks for getting in touch! I actually managed in the end. I converted the gtf I had from cufflinks to gff3 via the script 'gtf2gff.pl' from augustus and then used the script 'gffGetmRNA.pl' again from augustus to extract the mRNA in fasta. This file I fed to MAKER via the 'est=' route and now I get plenty of est2genome evidence in the maker result. So the problem seems to be limited to the route 'est_gff=', allthough there is no error message whatsoever the est2genome routine seems to never be triggered. > > I'd still be happy to upload my data (the cufflinks gff, the genome fasta, anything else?) if you want to try to reproduce the problem. Let me know! > > btw I seem to be unable to create a new topic or respond to topics via google groups. Is the list closed or the access restricted somehow. I only managed by responding to Jason's mail which I still had in my inbox directly via my gmail. > > Thanks! > > cheers, > Christoph > > On 09/03/2017 19:39, Carson Holt wrote: >> Jason never responded back to this one or uploaded his file to test. He probably figured it out off list. My guess is that your results are too fragmented to build a model that can pass filtering thresholds with. >> >> If you want I can take a look. You can upload all files for a test job here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> ?Carson >> >> >> >>> On Mar 7, 2017, at 5:51 PM, Christoph Hahn > wrote: >>> >>> Hi MAKER community, >>> >>> I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. >>> >>> I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! >>> >>> cheers, >>> Christoph >>> >>> On 10/02/2015 17:56, Carson Holt wrote: >>>> I ran a few est2genome runs with a cufflinks file i just generated and did not get any issues for EST based gene models. >>>> >>>> I?d like to at least have your test set to see if I can duplicate what you are seeing. >>>> >>>> Use this to upload the job files then I can just run it from my server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>> >>>> ?Carson >>>> >>>> >>>>> On Feb 3, 2015, at 11:13 AM, Jason Gallant > wrote: >>>>> >>>>> Hi Folks, >>>>> >>>>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been checking the output files, and have noticed that none of my RNAseq data was incorporated on the run. I used Cufflinks to perform alignments of libraries from several tissues, ran the accessory script cufflinks2gff3 for each tissue, then concatenated the resulting gff3 files. I even ran the accessory script gff3merge to check that the resulting file was properly formatted. >>>>> >>>>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only get protein2genome and repeatmasker evidence in my resulting maker gff3 file, and the genes predicted by these. Is there another option that I need to enable in order to use my est_gff file? I?m trying to get a set of genes to train the predictors for my next step. >>>>> >>>>> Any help would (as always) be greatly appreciated! >>>>> >>>>> Best, >>>>> Jason Gallant >>>>> >>>>> ? >>>>> Dr. Jason R. Gallant >>>>> Assistant Professor >>>>> Room 38 Natural Sciences >>>>> Department of Zoology >>>>> Michigan State University >>>>> East Lansing, MI 48824 >>>>> jgallant at msu.edu >>>>> office: 517-884-7756 >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Tue Mar 14 12:29:49 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Tue, 14 Mar 2017 10:29:49 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: setting up GFF3 output and fasta chunks doing blastn of ESTs running blast search. #--------- command -------------# Widget::blastn: /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah #-------------------------------# Error: NCBI C++ Exception: "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr Error: NCBI C++ Exception: "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr examining contents of the fasta file and run log ERROR: BLASTN failed --> rank=87, hostname=pnap-pe7-s09 ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:6537645 ERROR: BLASTN failed --> rank=88, hostname=pnap-pe7-s09 ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:6537659 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:6537645 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:6537659 On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt wrote: > Try reinstalling blast, or upgrade to a newer version of blast. > > ?Carson > > > On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: > > > I have maker running with MPI and I get this error over and over again for > every contig. Any Ideas? > > > MAKER WARNING: All old files will be erased before continuing > > #--------------------------------------------------------------------- > > Now starting the contig!! > > SeqID: 5239 > > Length: 1395 > > #--------------------------------------------------------------------- > > > > Error: NCBI C++ Exception: > > "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Criti > > > > *This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged, > including patient health information. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution or use of the contents of this message is strictly prohibited. > If you have received this message in error or are not the named recipient, > please notify us immediately by contacting the sender at the electronic > mail address noted above, and delete and destroy all copies of this > message. Thank you.* > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 12:36:07 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 11:36:07 -0600 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: The error itself is coming from BLAST. MAKER does provide the command used, so you can try it outside of MAKER. You can submit the files used as well as command used to the BLAST developers for them to test with. MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you can stop it from deleting files. Edit line 58 by setting CLEANUP => 0 Then you should be able to grab whatever files maker used to run blast, and copy the blast command used from STDERR. ?Carson > On Mar 14, 2017, at 11:29 AM, Marcus Naymik wrote: > > I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: > > setting up GFF3 output and fasta chunks > > doing blastn of ESTs > > running blast search. > > #--------- command -------------# > > Widget::blastn: > > /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah > > #-------------------------------# > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr > > > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr > > > > examining contents of the fasta file and run log > > ERROR: BLASTN failed > > --> rank=87, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537645 > > > > ERROR: BLASTN failed > > --> rank=88, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537659 > > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537645 > > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537659 > > > > > On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt > wrote: > Try reinstalling blast, or upgrade to a newer version of blast. > > ?Carson > > >> On Mar 2, 2017, at 1:05 PM, Marcus Naymik > wrote: >> >> >> I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? >> >> >> >> MAKER WARNING: All old files will be erased before continuing >> >> #--------------------------------------------------------------------- >> >> Now starting the contig!! >> >> SeqID: 5239 >> >> Length: 1395 >> >> #--------------------------------------------------------------------- >> >> >> >> >> >> Error: NCBI C++ Exception: >> >> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti >> >> >> >> >> >> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Mar 14 21:27:10 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Mar 2017 22:27:10 -0400 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <2017030519265949065818@cau.edu.cn> References: <2017030519265949065818@cau.edu.cn> Message-ID: <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> Hi Chao Chao, I?ve not run into this before. Could you post the RepeatModeler command you used? Thanks, Mike > On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: > > Dear sir: > Before the maker opeations, I do repeat masking first on my contigs. > However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. > > When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : > NOTE: RepeatScout did not return any models. > > Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? > > Hope to hear from you. > Best wishes! > > Chao Chao > 2017.03.05 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Wed Mar 15 09:26:15 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Wed, 15 Mar 2017 22:26:15 +0800 Subject: [maker-devel] How to get Pseudogene Message-ID: <2017031522261575294011@cau.edu.cn> Dear sir: I'd like to mask some pseudogene to my annotation. How can I do it? In the guide, the first step is "Run a tblastn of the protein sequence (query) vs. the intergenic genome sequence (subject/database)" My question is: What do the " protein sequence and the intergenic genome sequence " refer to seperately? My own protein database? How to use the result in maker annotation? Best wishes! Chao Chao 2017.03.15 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Mar 15 10:00:13 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 15 Mar 2017 11:00:13 -0400 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <201703152048212561203@cau.edu.cn> References: <2017030519265949065818@cau.edu.cn> <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> <201703152048212561203@cau.edu.cn> Message-ID: <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> Hi Chao Chao, I?m not sure how to trouble shoot this if there were no error messages. I?ve ccd a couple of people that have worked with this protocol much more than I have. Ning and Kevin, Do you have any tips for running these tools that may help Chao Chao? Thanks, Mike > On Mar 15, 2017, at 8:48 AM, dcg at cau.edu.cn wrote: > > Thank for your reply! > I just followed the guide iat http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced > > To use LTRHarvest, my command is as below(the filename was set for my favor) > DIR1/gt suffixerator -db seqfile -indexname seqfileindex -tis -suf -lcp -des -ssp ?dna > DIR1/gt ltrharvest -index seqfileindex -out seqfile.out99 -outinner seqfile.outinner99 -gff3 seqfile.gff99 -minlenltr 100 \ > -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > seqfile.result99 > No error, but no results as well > > Chao Chao > 2017.03.15 > > From: Michael Campbell > Date: 2017-03-15 10:27 > To: dcg > CC: maker-devel > Subject: Re: [maker-devel] For help about masking repeats before annotation > Hi Chao Chao, > > I?ve not run into this before. Could you post the RepeatModeler command you used? > > Thanks, > Mike >> On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: >> >> Dear sir: >> Before the maker opeations, I do repeat masking first on my contigs. >> However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. >> >> When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : >> NOTE: RepeatScout did not return any models. >> >> Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? >> >> Hope to hear from you. >> Best wishes! >> >> Chao Chao >> 2017.03.05 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Wed Mar 15 11:54:48 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Wed, 15 Mar 2017 09:54:48 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: Thanks, you're right. I had to recompile blast from src with this flag: -std=c++0x On Tue, Mar 14, 2017 at 10:36 AM, Carson Holt wrote: > The error itself is coming from BLAST. MAKER does provide the command > used, so you can try it outside of MAKER. You can submit the files used as > well as command used to the BLAST developers for them to test with. > > MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you > can stop it from deleting files. > > Edit line 58 by setting CLEANUP => 0 > > Then you should be able to grab whatever files maker used to run blast, > and copy the blast command used from STDERR. > > ?Carson > > > > On Mar 14, 2017, at 11:29 AM, Marcus Naymik wrote: > > I have now tried with multiple versions of blast (2.6 and 2.28 binaries > and built from source) and get the same error: > > setting up GFF3 output and fasta chunks > > doing blastn of ESTs > > running blast search. > > #--------- command -------------# > > Widget::blastn: > > /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db > /scratch/mnaymik/maker/tmp/maker_cah > > #-------------------------------# > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Cr > > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Cr > > > examining contents of the fasta file and run log > > ERROR: BLASTN failed > > --> rank=87, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537645 > > > ERROR: BLASTN failed > > --> rank=88, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537659 > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537645 > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537659 > > > > On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt wrote: > >> Try reinstalling blast, or upgrade to a newer version of blast. >> >> ?Carson >> >> >> On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: >> >> >> I have maker running with MPI and I get this error over and over again >> for every contig. Any Ideas? >> >> >> MAKER WARNING: All old files will be erased before continuing >> >> #--------------------------------------------------------------------- >> >> Now starting the contig!! >> >> SeqID: 5239 >> >> Length: 1395 >> >> #--------------------------------------------------------------------- >> >> >> >> Error: NCBI C++ Exception: >> >> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", >> line 925: Criti >> >> >> >> *This electronic message is intended to be for the use only of the named >> recipient, and may contain information that is confidential or privileged, >> including patient health information. If you are not the intended >> recipient, you are hereby notified that any disclosure, copying, >> distribution or use of the contents of this message is strictly prohibited. >> If you have received this message in error or are not the named recipient, >> please notify us immediately by contacting the sender at the electronic >> mail address noted above, and delete and destroy all copies of this >> message. Thank you.* >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > *This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged, > including patient health information. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution or use of the contents of this message is strictly prohibited. > If you have received this message in error or are not the named recipient, > please notify us immediately by contacting the sender at the electronic > mail address noted above, and delete and destroy all copies of this > message. Thank you.* > > > -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 15 12:00:18 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 15 Mar 2017 11:00:18 -0600 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: <6A6C819F-D903-401A-8522-29FEBC955F17@gmail.com> Glad I could help. Remember to switch back CLEANUP => 1 if you set it to 0 to debug. Otherwise you will have a lot of files left in /tmp after each MAKER run. ?Carson > On Mar 15, 2017, at 10:54 AM, Marcus Naymik wrote: > > Thanks, you're right. I had to recompile blast from src with this flag: -std=c++0x > > On Tue, Mar 14, 2017 at 10:36 AM, Carson Holt > wrote: > The error itself is coming from BLAST. MAKER does provide the command used, so you can try it outside of MAKER. You can submit the files used as well as command used to the BLAST developers for them to test with. > > MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you can stop it from deleting files. > > Edit line 58 by setting CLEANUP => 0 > > Then you should be able to grab whatever files maker used to run blast, and copy the blast command used from STDERR. > > ?Carson > > > >> On Mar 14, 2017, at 11:29 AM, Marcus Naymik > wrote: >> >> I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: >> >> setting up GFF3 output and fasta chunks >> >> doing blastn of ESTs >> >> running blast search. >> >> #--------- command -------------# >> >> Widget::blastn: >> >> /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah >> >> #-------------------------------# >> >> Error: NCBI C++ Exception: >> >> "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr >> >> >> >> Error: NCBI C++ Exception: >> >> "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr >> >> >> >> examining contents of the fasta file and run log >> >> ERROR: BLASTN failed >> >> --> rank=87, hostname=pnap-pe7-s09 >> >> ERROR: Failed while doing blastn of ESTs >> >> ERROR: Chunk failed at level:0, tier_type:3 >> >> FAILED CONTIG:6537645 >> >> >> >> ERROR: BLASTN failed >> >> --> rank=88, hostname=pnap-pe7-s09 >> >> ERROR: Failed while doing blastn of ESTs >> >> ERROR: Chunk failed at level:0, tier_type:3 >> >> FAILED CONTIG:6537659 >> >> >> >> ERROR: Chunk failed at level:4, tier_type:0 >> >> FAILED CONTIG:6537645 >> >> >> >> ERROR: Chunk failed at level:4, tier_type:0 >> >> FAILED CONTIG:6537659 >> >> >> >> >> On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt > wrote: >> Try reinstalling blast, or upgrade to a newer version of blast. >> >> ?Carson >> >> >>> On Mar 2, 2017, at 1:05 PM, Marcus Naymik > wrote: >>> >>> >>> I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? >>> >>> >>> >>> MAKER WARNING: All old files will be erased before continuing >>> >>> #--------------------------------------------------------------------- >>> >>> Now starting the contig!! >>> >>> SeqID: 5239 >>> >>> Length: 1395 >>> >>> #--------------------------------------------------------------------- >>> >>> >>> >>> >>> >>> Error: NCBI C++ Exception: >>> >>> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti >>> >>> >>> >>> >>> >>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >> > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangn at msu.edu Wed Mar 15 10:56:30 2017 From: jiangn at msu.edu (Jiang, Ning) Date: Wed, 15 Mar 2017 15:56:30 +0000 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> References: <2017030519265949065818@cau.edu.cn> <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> <201703152048212561203@cau.edu.cn>, <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> Message-ID: Hi Chao Chao, I guess you have an extra "\" in your second command. We put that sign there to indicate the entire thing belong to one command (it is too long to put in one row). I suggest you remove the "\" and try again. Good luck! Ning Jiang ________________________________ From: Michael Campbell Sent: Wednesday, March 15, 2017 11:00:13 AM To: dcg at cau.edu.cn Cc: maker-devel; Jiang, Ning; Kevin Childs Subject: Re: [maker-devel] For help about masking repeats before annotation Hi Chao Chao, I?m not sure how to trouble shoot this if there were no error messages. I?ve ccd a couple of people that have worked with this protocol much more than I have. Ning and Kevin, Do you have any tips for running these tools that may help Chao Chao? Thanks, Mike On Mar 15, 2017, at 8:48 AM, dcg at cau.edu.cn wrote: Thank for your reply! I just followed the guide iat http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced To use LTRHarvest, my command is as below(the filename was set for my favor) DIR1/gt suffixerator -db seqfile -indexname seqfileindex -tis -suf -lcp -des -ssp ?dna DIR1/gt ltrharvest -index seqfileindex -out seqfile.out99 -outinner seqfile.outinner99 -gff3 seqfile.gff99 -minlenltr 100 \ -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > seqfile.result99 No error, but no results as well Chao Chao ________________________________ 2017.03.15 From: Michael Campbell Date: 2017-03-15 10:27 To: dcg CC: maker-devel Subject: Re: [maker-devel] For help about masking repeats before annotation Hi Chao Chao, I?ve not run into this before. Could you post the RepeatModeler command you used? Thanks, Mike On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: Dear sir: Before the maker opeations, I do repeat masking first on my contigs. However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : NOTE: RepeatScout did not return any models. Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? Hope to hear from you. Best wishes! Chao Chao ________________________________ 2017.03.05 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 16 10:19:02 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 Mar 2017 09:19:02 -0600 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> Message-ID: <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied. Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. So the final result is not a superset, rather a merged subset from each potential source. EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn?t work it produces either no model or partial models. ?Carson > On Mar 16, 2017, at 3:07 AM, Ray Cui wrote: > > Dear Carson, > > thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set? > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt > wrote: > Maybe. I haven?t tested this, but it should work. Maker supports labels for input by placing a ?:? and a label after each file name. > > Example?> > est=file1.fasta:label_1,file2.fasta:label_2 > > If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2. > > As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3?> > evmtrans:est2genome:label1=10 > > I don?t know if the label will force anything raw analysis to rerun, but it shouldn?t. > > > ?Carson > > > >> On Mar 15, 2017, at 5:13 AM, Ray Cui > wrote: >> >> Hi Carson, >> >> currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa? >> >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui > wrote: >> Dear Carson, >> >> thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see. >> >> It is good to know that MAKER will reuse the old results. >> >> Best Regards, >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt > wrote: >> You can find lots of info in the devel archives on training. Example ?> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >> >> Also example of training SNAP on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors >> >> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues. >> >> Thanks, >> Carson >> >> >>> On Mar 14, 2017, at 10:44 AM, Ray Cui > wrote: >>> >>> Hi Carson, >>> Thanks for your prompt response! >>> >>> I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? >>> After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui > wrote: >>> I see. If my evm config looks like this: >>> evmab=5 #default weight for source unspecified ab initio predictions >>> evmab:snap=5 #weight for snap sourced predictions >>> evmab:augustus=10 #weight for augustus sourced predictions >>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>> evmab:genemark=5 #weight for genemark sourced predictions >>> >>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct? >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt > wrote: >>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source. >>> >>> ?Carson >>> >>>> On Mar 14, 2017, at 10:26 AM, Ray Cui > wrote: >>>> >>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now. >>>> >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt > wrote: >>>> >>>> These are set in the maker_evm.ctl file. >>>> >>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this ?> >>>> evmab:GENEMARK=7 >>>> >>>> This also works ?> >>>> evmab:pred_gff:GENEMARK=7 >>>> >>>> Or just set the default ?> >>>> evmab=7 >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui > wrote: >>>>> >>>>> Dear Carson, >>>>> >>>>> I think it may be the most straight foward to input the GFF3 instead. >>>>> >>>>> What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option? >>>>> >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 >>>>> Mobile: +49 0221 37970 496 <> >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt > wrote: >>>>> It may work as is as long as you don?t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff. >>>>> >>>>> ?Carson >>>>> >>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui > wrote: >>>>>> >>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>> >>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>> >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 >>>>>> Mobile: +49 0221 37970 496 <> >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt > wrote: >>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts. >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui > wrote: >>>>>>> >>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>> >>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage. >>>>>>> >>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl ", with the following command lines options: >>>>>>> >>>>>>> Usage: /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] --sequence [filename] >>>>>>> >>>>>>> GeneMark-ES Suite version 4.33 >>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction >>>>>>> >>>>>>> Input sequence/s should be in FASTA format >>>>>>> >>>>>>> Algorithm options >>>>>>> --ES to run self-training >>>>>>> --fungus to run algorithm with branch point model (most useful for fungal genomes) >>>>>>> --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) >>>>>>> --et_score [number]; 4 (default) minimum score of intron in initiation of the ET algorithm >>>>>>> --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome >>>>>>> --training_only to run only training step >>>>>>> --prediction_only to run only prediction step >>>>>>> --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) >>>>>>> >>>>>>> Sequence pre-processing options >>>>>>> --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig >>>>>>> --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training >>>>>>> --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap >>>>>>> Letters 'n' and 'N' are interpreted as standing within gaps >>>>>>> --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask >>>>>>> Letters 'x' and 'X' are interpreted as results of hard masking of repeats >>>>>>> --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length >>>>>>> >>>>>>> Run options >>>>>>> --cores [number]; 1 (default) to run program with multiple threads >>>>>>> --pbs to run on cluster with PBS support >>>>>>> --v verbose >>>>>>> >>>>>>> Customizing parameters: >>>>>>> --max_intron [number]; default 10000 (3000 fungi), maximum length of intron >>>>>>> --max_intergenic [number]; default 10000, maximum length of intergenic regions >>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step >>>>>>> >>>>>>> Developer options: >>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>> --ini_mod [filename]; use this file with parameters for algorithm initiation >>>>>>> --test_set [filename]; to evaluate prediction accuracy on the given test set >>>>>>> --key_bin >>>>>>> --debug >>>>>>> # ------------------- >>>>>>> >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 >>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt > wrote: >>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui > wrote: >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark? >>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Ray >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 >>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt > wrote: >>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui > wrote: >>>>>>>>> >>>>>>>>> Dear Carson, >>>>>>>>> >>>>>>>>> I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the --prediction and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps) >>>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt > wrote: >>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter. >>>>>>>>> >>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don?t know how this compares to GeneMark-ET. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui > wrote: >>>>>>>>>> >>>>>>>>>> Hi Daniel, >>>>>>>>>> >>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? >>>>>>>>>> >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel > wrote: >>>>>>>>>> Hi Ray, >>>>>>>>>> >>>>>>>>>> I think you?re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path to the ?es.mod" file made by Genemark. >>>>>>>>>> >>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. >>>>>>>>>> >>>>>>>>>> ~Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui > wrote: >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor. >>>>>>>>>>> When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>> >>>>>>>>>>> And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcui at age.mpg.de Thu Mar 16 11:02:08 2017 From: rcui at age.mpg.de (Ray Cui) Date: Thu, 16 Mar 2017 17:02:08 +0100 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Message-ID: Dear Carson, thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster. Best Regards, Ray Dr. Rongfeng (Ray) Cui Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing Wissenschaftlicher MA / Postdoctoral researcher Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne Tel.:+49 (0)221 496 Mobile: +49 0221 37970 496 rcui at age.mpg.de www.age.mpg.de On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt wrote: > Final results with source maker will be of type gene/mRNA/exon/CDS. They > have been further processed beyond the raw results, and may include > extensions such as the addition of UTR for example (or hint based > recomputation in the case of SNAP and Augustus). The gene ID of the maker > model will let you know the source before additional processing was > applied. Raw results will also be in the file as type match/match_part and > source evm/snap/augustus, but are only there for reference purposes (there > will also be a raw fasta from each source, but only for reference > purposes). All models compete against each other, and the one best matching > the evidence is kept. So if SNAP or Augustus scores better than EVM, then > that model will be kept for that locus. You can find more detail in the > MAKER wiki and the MAKER2 paper for how models compete. > > So the final result is not a superset, rather a merged subset from each > potential source. > > EVM is not used to obtain a consensus gene model. Its results compete just > like all other algorithms. This is because when EVM works it produces > beautiful models that score really well, but when it doesn?t work it > produces either no model or partial models. > > ?Carson > > > On Mar 16, 2017, at 3:07 AM, Ray Cui wrote: > > Dear Carson, > > thank you so much! I am now peeking into the results for the > finished scaffolds. In the gff file, the gene id confuses me a bit. In this > file, column 2 is always "maker", but the "ID" attribute in the annotation > is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean > the final annotation is a superset of all gene predictors? If EVM was used > to obtain a consensus gene model, why would the other models still show up > in the final result set? > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for > Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 <+49%20221%20496> > Mobile: +49 0221 37970 496 > rcui at age.mpg.de > www.age.mpg.de > > > > On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt wrote: > >> Maybe. I haven?t tested this, but it should work. Maker supports labels >> for input by placing a ?:? and a label after each file name. >> >> Example?> >> est=file1.fasta:label_1,file2.fasta:label_2 >> >> If you label your files, then the label will go into the GFF3. So instead >> of est2genome in column 2, you will get est2genome:label_1 in column 2. >> >> As a result, you should be able to add that label to the EVM settings >> like so and it will match column 2 of the GFF3?> >> evmtrans:est2genome:label1=10 >> >> I don?t know if the label will force anything raw analysis to rerun, but >> it shouldn?t. >> >> >> ?Carson >> >> >> >> On Mar 15, 2017, at 5:13 AM, Ray Cui wrote: >> >> Hi Carson, >> >> currently I am partitioning the protein evidence based on >> phylogenetic relationship into several datasets, supplied as comma >> delimited list. Is it possible then to specify higher weight for >> protein2genome models from closer related species than further related taxa? >> >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >> Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 <+49%20221%20496> >> Mobile: +49 0221 37970 496 >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui wrote: >> >>> Dear Carson, >>> >>> thank you for the pointers! Before running the first round of >>> Maker, I mapped conspecific Trinity assembled proteins (long, "full length" >>> subset) to an earlier version of the genome assembly using my own pipeline >>> and trained Augustus and SNAP that way. I also trained Genemark-ET using >>> TopHat alignments per their instructions. I'm wondering if it will be worth >>> doing a second round, but I guess I will see. >>> >>> It is good to know that MAKER will reuse the old results. >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >>> Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 <+49%20221%20496> >>> Mobile: +49 0221 37970 496 >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt wrote: >>> >>>> You can find lots of info in the devel archives on training. Example ?> >>>> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >>>> >>>> Also example of training SNAP on the wiki ?> >>>> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/M >>>> AKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_init >>>> io_Gene_Predictors >>>> >>>> MAKER will reuse old raw results if you rerun in the same directory >>>> (only deleting what would be different given altered settings between >>>> runs). It will see the existing alignments archived in the datastore as raw >>>> reports and just reuse them. The exception to this are the exonerate >>>> alignments. They are generated relatively quickly compared to the BLAS T >>>> runs, so rerunning them is not too much overhead. Also they are not >>>> archived because doing so created IO issues (exonerate is not running in >>>> bulk batches like BLAST, rather as multiple small separate runs for each >>>> polished read, and archiving a lot of small raw reports can occur so fast >>>> when using MPI that it crashes storage servers). So we decided to just not >>>> archive exonerate rather than develop a database like bundling/compression >>>> mechanism to get around the IO issues. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On Mar 14, 2017, at 10:44 AM, Ray Cui wrote: >>>> >>>> Hi Carson, >>>> Thanks for your prompt response! >>>> >>>> I have a somewhat unrelated question. After the first run of >>>> Maker, I want to train Augustus, SNAP and Genemark-ET using the most >>>> reliable gene models produced in the first round. What would be a good way >>>> to select these gene models? >>>> After retraining the ab initio predictors, I also wonder if >>>> it's necessary to redo all the alignments (blastx, est2genome, >>>> protein2genome etc) in the second iteration, since they are exactly the >>>> same as the first run. Perhaps maker can take in the alignment results from >>>> the previous run? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >>>> Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>> Mobile: +49 0221 37970 496 >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui wrote: >>>> >>>>> I see. If my evm config looks like this: >>>>> evmab=5 #default weight for source unspecified ab initio predictions >>>>> evmab:snap=5 #weight for snap sourced predictions >>>>> evmab:augustus=10 #weight for augustus sourced predictions >>>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>>>> evmab:genemark=5 #weight for genemark sourced predictions >>>>> >>>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value >>>>> from "evmab" (=5) will be used, is that correct? >>>>> >>>>> Best Regards, >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>> for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>> Mobile: +49 0221 37970 496 >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt >>>>> wrote: >>>>> >>>>>> Column 2 in the GFF3 file is the source column. It is used to specify >>>>>> the source fo the data. That column will also be used by EVM to bin >>>>>> features by their source and apply weights based on source. >>>>>> >>>>>> ?Carson >>>>>> >>>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui wrote: >>>>>> >>>>>> Thanks! I didn't know you can also name the gff, but I think using >>>>>> the default is fine, that's what I'm doing now. >>>>>> >>>>>> >>>>>> Best Regards, >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>> for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>> Mobile: +49 0221 37970 496 >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> These are set in the maker_evm.ctl file. >>>>>>> >>>>>>> Use whatever you used in the source column of the input GFF3. For >>>>>>> example if column 2 is set as GENEMARK, then do this ?> >>>>>>> evmab:GENEMARK=7 >>>>>>> >>>>>>> This also works ?> >>>>>>> evmab:pred_gff:GENEMARK=7 >>>>>>> >>>>>>> Or just set the default ?> >>>>>>> evmab=7 >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui wrote: >>>>>>> >>>>>>> Dear Carson, >>>>>>> >>>>>>> I think it may be the most straight foward to input the GFF3 >>>>>>> instead. >>>>>>> >>>>>>> What is the correct way of setting a weight for the EVM step >>>>>>> for this GFF3 models passed through the pred_gff option? >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>>> for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>> Mobile: +49 0221 37970 496 >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt >>>>>>> wrote: >>>>>>> >>>>>>>> It may work as is as long as you don?t need any of the additional >>>>>>>> options that have been added. If not, you can also just run it outside of >>>>>>>> MAKER then provide the result in GFF3 format to pred_gff. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui wrote: >>>>>>>> >>>>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>>>> >>>>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>>>> >>>>>>>> Ray >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>>>> for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET >>>>>>>>> and GeneMark-EP into GeneMark-ES scripts. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui wrote: >>>>>>>>> >>>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>>>> >>>>>>>>> I think there must have been a big update between different >>>>>>>>> Genemark versions. It seems that they now also supports evidence being fed >>>>>>>>> into the prediction stage. >>>>>>>>> >>>>>>>>> The name of the latest version of the genemark script has been >>>>>>>>> changed to "gmes_petap.pl", with the following command lines >>>>>>>>> options: >>>>>>>>> >>>>>>>>> Usage: /beegfs/group_dv/software/sou >>>>>>>>> rce/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] >>>>>>>>> --sequence [filename] >>>>>>>>> >>>>>>>>> GeneMark-ES Suite version 4.33 >>>>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) >>>>>>>>> based training and prediction >>>>>>>>> >>>>>>>>> Input sequence/s should be in FASTA format >>>>>>>>> >>>>>>>>> Algorithm options >>>>>>>>> --ES to run self-training >>>>>>>>> --fungus to run algorithm with branch point model (most >>>>>>>>> useful for fungal genomes) >>>>>>>>> --ET [filename]; to run training with introns >>>>>>>>> coordinates from RNA-Seq read alignments (GFF format) >>>>>>>>> --et_score [number]; 4 (default) minimum score of intron in >>>>>>>>> initiation of the ET algorithm >>>>>>>>> --evidence [filename]; to use in prediction external >>>>>>>>> evidence (RNA or protein) mapped to genome >>>>>>>>> --training_only to run only training step >>>>>>>>> --prediction_only to run only prediction step >>>>>>>>> --predict_with [filename]; predict genes using this file species >>>>>>>>> specific parameters (bypass regular training and prediction steps) >>>>>>>>> >>>>>>>>> Sequence pre-processing options >>>>>>>>> --max_contig [number]; 5000000 (default) will split input >>>>>>>>> genomic sequence into contigs shorter then max_contig >>>>>>>>> --min_contig [number]; 50000 (default); will ignore contigs >>>>>>>>> shorter then min_contig in training >>>>>>>>> --max_gap [number]; 5000 (default); will split sequence at >>>>>>>>> gaps longer than max_gap >>>>>>>>> Letters 'n' and 'N' are interpreted as standing >>>>>>>>> within gaps >>>>>>>>> --max_mask [number]; 5000 (default); will split sequence at >>>>>>>>> repeats longer then max_mask >>>>>>>>> Letters 'x' and 'X' are interpreted as results of >>>>>>>>> hard masking of repeats >>>>>>>>> --soft_mask [number] to indicate that lowercase letters stand >>>>>>>>> for repeats; utilize only lowercase repeats longer than specified length >>>>>>>>> >>>>>>>>> Run options >>>>>>>>> --cores [number]; 1 (default) to run program with >>>>>>>>> multiple threads >>>>>>>>> --pbs to run on cluster with PBS support >>>>>>>>> --v verbose >>>>>>>>> >>>>>>>>> Customizing parameters: >>>>>>>>> --max_intron [number]; default 10000 (3000 fungi), >>>>>>>>> maximum length of intron >>>>>>>>> --max_intergenic [number]; default 10000, maximum length of >>>>>>>>> intergenic regions >>>>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum >>>>>>>>> allowed gene length in prediction step >>>>>>>>> >>>>>>>>> Developer options: >>>>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>>>> --ini_mod [filename]; use this file with parameters for >>>>>>>>> algorithm initiation >>>>>>>>> --test_set [filename]; to evaluate prediction accuracy on >>>>>>>>> the given test set >>>>>>>>> --key_bin >>>>>>>>> --debug >>>>>>>>> # ------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>> Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Also note that the gmhmme3 executable distributed with different >>>>>>>>>> flavors of genemark has had the same name but has been quite different in >>>>>>>>>> both command line structure and output between flavors. >>>>>>>>>> >>>>>>>>>> ?Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui wrote: >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters >>>>>>>>>> automatically set by Maker when calling Genemark? >>>>>>>>>> If you can point me to the part of the maker source code that >>>>>>>>>> construct the final genemark command line I can also take a look. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt >>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. >>>>>>>>>>> It depends on if formatting or any flags have changed between versions. >>>>>>>>>>> >>>>>>>>>>> ?Carson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui wrote: >>>>>>>>>>> >>>>>>>>>>> Dear Carson, >>>>>>>>>>> >>>>>>>>>>> I have now run GeneMark-ET, and it produces a trained >>>>>>>>>>> .mod file. I think it can be then passed to Maker. Do you know what is the >>>>>>>>>>> final constructed command line in Maker that calls genemark? Genemark-et >>>>>>>>>>> and es use the same perl script so one probably only needs to use the >>>>>>>>>>> --prediction and --predict_with xxx.mod options to predict genes using >>>>>>>>>>> the species specific parameters (bypassing regular training and prediction >>>>>>>>>>> steps) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may >>>>>>>>>>>> not work with GeneMark-ET. So any MAKER related archive posts etc. will be >>>>>>>>>>>> related to the latter. >>>>>>>>>>>> >>>>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let >>>>>>>>>>>> it run. It would then produce several files and output directories. The >>>>>>>>>>>> es.mod file was the one you provided to MAKER. I don?t know how this >>>>>>>>>>>> compares to GeneMark-ET. >>>>>>>>>>>> >>>>>>>>>>>> ?Carson >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Daniel, >>>>>>>>>>>> >>>>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" >>>>>>>>>>>> flag, is that the flag I should use when training or should I just let >>>>>>>>>>>> Genemark also perform the prediction? >>>>>>>>>>>> >>>>>>>>>>>> Ray >>>>>>>>>>>> >>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ray, >>>>>>>>>>>>> >>>>>>>>>>>>> I think you?re on the right track with training Genemark with >>>>>>>>>>>>> RNAseq data. It should only change the training steps, which are external >>>>>>>>>>>>> to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path >>>>>>>>>>>>> to the ?es.mod" file made by Genemark. >>>>>>>>>>>>> >>>>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a >>>>>>>>>>>>> control file for EVM, in which you set your weights for the various inputs, >>>>>>>>>>>>> and then MAKER runs EVM alongside all the other gene predictors and chooses >>>>>>>>>>>>> the model that is best supported by the evidence. >>>>>>>>>>>>> >>>>>>>>>>>>> ~Daniel >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> I have sucessfully installed Maker beta 3, working >>>>>>>>>>>>> with both Augustus and SNAP. I also want to try adding GeneMark-ES to the >>>>>>>>>>>>> ab initio predictor. >>>>>>>>>>>>> When I read the GeneMark-ES manual, it says that one >>>>>>>>>>>>> can use RNAseq data to aid training. I'm wondering what would be the best >>>>>>>>>>>>> way to integrate Genemark-ET predictions into Maker. Should I run >>>>>>>>>>>>> Genemark-ET independent of Maker, then integrate the GFF at some point >>>>>>>>>>>>> during the maker process? If so, how should I edit the configuration file? >>>>>>>>>>>>> Currently maker has an option called "gmhmm". Should I then train GeneMark >>>>>>>>>>>>> by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>>>> >>>>>>>>>>>>> And perhaps an unrelated question is that now Maker >>>>>>>>>>>>> beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, >>>>>>>>>>>>> what does it do), and how does it differ from what Maker is designed for >>>>>>>>>>>>> (both reconciles different gene models). >>>>>>>>>>>>> >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Ray >>>>>>>>>>>>> >>>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>>>>>>>>>>>> ell-lab.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>>>>>>>>>>> ell-lab.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 16 12:30:16 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 Mar 2017 11:30:16 -0600 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Message-ID: 1. Verify that the issue is not being caused by hints from evidence (i.e. that you aren?t feeding fused mRNA-seq assemblies or protein evidence). Fused evidence will result in hints that fuse models. 2. If it still have an issue, then drop SNAP. Not all predictors work well on all genomes. Also no one can post to the google group. It?s just for archival. All message have to go to the mailing list here, and they then get archived on google ?> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org The mailing list logs shows that you requested to unsubscribed earlier today. ?Carson > On Mar 16, 2017, at 11:22 AM, Ray Cui wrote: > > Hi Carson, > > due to some reason I can't seem to post anymore on the google group. > > After looking at the results, it appears that SNAP performs poorly compared to genemark-ET and augustus. It looks like it's very prone to fusing neighboring genes and getting false positives. Is that a general thing you see in vertebrate genomes with SNAP? I saw that you didn't recommend SNAP for primates, perhaps the issue is similar? > > Attached you can see a screen shot of IGV browser, with all evidence tracks separated. > > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Thu, Mar 16, 2017 at 5:02 PM, Ray Cui > wrote: > Dear Carson, > > thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster. > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt > wrote: > Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied. Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. > > So the final result is not a superset, rather a merged subset from each potential source. > > EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn?t work it produces either no model or partial models. > > ?Carson > > >> On Mar 16, 2017, at 3:07 AM, Ray Cui > wrote: >> >> Dear Carson, >> >> thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set? >> >> Best Regards, >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt > wrote: >> Maybe. I haven?t tested this, but it should work. Maker supports labels for input by placing a ?:? and a label after each file name. >> >> Example?> >> est=file1.fasta:label_1,file2.fasta:label_2 >> >> If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2. >> >> As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3?> >> evmtrans:est2genome:label1=10 >> >> I don?t know if the label will force anything raw analysis to rerun, but it shouldn?t. >> >> >> ?Carson >> >> >> >>> On Mar 15, 2017, at 5:13 AM, Ray Cui > wrote: >>> >>> Hi Carson, >>> >>> currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa? >>> >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui > wrote: >>> Dear Carson, >>> >>> thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see. >>> >>> It is good to know that MAKER will reuse the old results. >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt > wrote: >>> You can find lots of info in the devel archives on training. Example ?> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >>> >>> Also example of training SNAP on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors >>> >>> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues. >>> >>> Thanks, >>> Carson >>> >>> >>>> On Mar 14, 2017, at 10:44 AM, Ray Cui > wrote: >>>> >>>> Hi Carson, >>>> Thanks for your prompt response! >>>> >>>> I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? >>>> After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui > wrote: >>>> I see. If my evm config looks like this: >>>> evmab=5 #default weight for source unspecified ab initio predictions >>>> evmab:snap=5 #weight for snap sourced predictions >>>> evmab:augustus=10 #weight for augustus sourced predictions >>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>>> evmab:genemark=5 #weight for genemark sourced predictions >>>> >>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt > wrote: >>>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source. >>>> >>>> ?Carson >>>> >>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui > wrote: >>>>> >>>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now. >>>>> >>>>> >>>>> Best Regards, >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 >>>>> Mobile: +49 0221 37970 496 <> >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt > wrote: >>>>> >>>>> These are set in the maker_evm.ctl file. >>>>> >>>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this ?> >>>>> evmab:GENEMARK=7 >>>>> >>>>> This also works ?> >>>>> evmab:pred_gff:GENEMARK=7 >>>>> >>>>> Or just set the default ?> >>>>> evmab=7 >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui > wrote: >>>>>> >>>>>> Dear Carson, >>>>>> >>>>>> I think it may be the most straight foward to input the GFF3 instead. >>>>>> >>>>>> What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option? >>>>>> >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 >>>>>> Mobile: +49 0221 37970 496 <> >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt > wrote: >>>>>> It may work as is as long as you don?t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff. >>>>>> >>>>>> ?Carson >>>>>> >>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui > wrote: >>>>>>> >>>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>>> >>>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 >>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt > wrote: >>>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui > wrote: >>>>>>>> >>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>>> >>>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage. >>>>>>>> >>>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl ", with the following command lines options: >>>>>>>> >>>>>>>> Usage: /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] --sequence [filename] >>>>>>>> >>>>>>>> GeneMark-ES Suite version 4.33 >>>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction >>>>>>>> >>>>>>>> Input sequence/s should be in FASTA format >>>>>>>> >>>>>>>> Algorithm options >>>>>>>> --ES to run self-training >>>>>>>> --fungus to run algorithm with branch point model (most useful for fungal genomes) >>>>>>>> --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) >>>>>>>> --et_score [number]; 4 (default) minimum score of intron in initiation of the ET algorithm >>>>>>>> --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome >>>>>>>> --training_only to run only training step >>>>>>>> --prediction_only to run only prediction step >>>>>>>> --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) >>>>>>>> >>>>>>>> Sequence pre-processing options >>>>>>>> --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig >>>>>>>> --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training >>>>>>>> --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap >>>>>>>> Letters 'n' and 'N' are interpreted as standing within gaps >>>>>>>> --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask >>>>>>>> Letters 'x' and 'X' are interpreted as results of hard masking of repeats >>>>>>>> --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length >>>>>>>> >>>>>>>> Run options >>>>>>>> --cores [number]; 1 (default) to run program with multiple threads >>>>>>>> --pbs to run on cluster with PBS support >>>>>>>> --v verbose >>>>>>>> >>>>>>>> Customizing parameters: >>>>>>>> --max_intron [number]; default 10000 (3000 fungi), maximum length of intron >>>>>>>> --max_intergenic [number]; default 10000, maximum length of intergenic regions >>>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step >>>>>>>> >>>>>>>> Developer options: >>>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>>> --ini_mod [filename]; use this file with parameters for algorithm initiation >>>>>>>> --test_set [filename]; to evaluate prediction accuracy on the given test set >>>>>>>> --key_bin >>>>>>>> --debug >>>>>>>> # ------------------- >>>>>>>> >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 >>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt > wrote: >>>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui > wrote: >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark? >>>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt > wrote: >>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui > wrote: >>>>>>>>>> >>>>>>>>>> Dear Carson, >>>>>>>>>> >>>>>>>>>> I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the --prediction and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt > wrote: >>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter. >>>>>>>>>> >>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don?t know how this compares to GeneMark-ET. >>>>>>>>>> >>>>>>>>>> ?Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Daniel, >>>>>>>>>>> >>>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? >>>>>>>>>>> >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel > wrote: >>>>>>>>>>> Hi Ray, >>>>>>>>>>> >>>>>>>>>>> I think you?re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path to the ?es.mod" file made by Genemark. >>>>>>>>>>> >>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. >>>>>>>>>>> >>>>>>>>>>> ~Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor. >>>>>>>>>>>> When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>>> >>>>>>>>>>>> And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Ray >>>>>>>>>>>> >>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Mar 16 22:48:10 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Mar 2017 23:48:10 -0400 Subject: [maker-devel] split genes Message-ID: Hello: If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 17 10:21:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Mar 2017 09:21:10 -0600 Subject: [maker-devel] split genes In-Reply-To: References: Message-ID: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. ?Carson > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang wrote: > > Hello: > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Fri Mar 17 12:49:06 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 17 Mar 2017 13:49:06 -0400 Subject: [maker-devel] split genes In-Reply-To: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. Best Quanwei 2017-03-17 11:21 GMT-04:00 Carson Holt : > MAKER will not try and predict a gene across contigs because it it too > difficult to determine contig order. If you are able to determine order, > then it is best to merge the contigs into a single scaffold before > annotating rather than try and produce split models in GFF3. > > ?Carson > > > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > If one gene was covered by two contigs, sometimes we may predicted two > genes. I wonder how Maker deal with such conditions? > > Even Maker tried to reduce such cases, they can not be completely avoid. > So I wonder whether there is any way or any tool to find such split genes > (one gene split into two contigs and predicted as two genes)? > > > > As we know, we can also provide protein sequences and transcript > assembly as evidences. Can a protein sequence or transcript assembly rescue > the split genes in Maker pipe line? For example, if one transcript cover > 40% of predicted genes predicted in two contigs, then merge the predicted > genes into one? > > > > Thanks > > > > Best > > Quanwei > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Fri Mar 17 17:37:16 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 17 Mar 2017 18:37:16 -0400 Subject: [maker-devel] putative gene function by mapping to UniProt/Swiss-prot set Message-ID: Hello: I have a questions about the assigning putative gene function by mapping to UniProt/Swiss-prot gene set (described in the protocol published in 2014). Here, for each of the gene model from Maker, the pipeline will find the most similar protein in UniProt/Swiss-prot and assign the function of the matched protein, right? It does not require best-reciprocal hit, right? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Mar 20 08:03:10 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 20 Mar 2017 09:03:10 -0400 Subject: [maker-devel] putative gene function by mapping to UniProt/Swiss-prot set In-Reply-To: References: Message-ID: Hi Quanwei, Correct. Just the best hit when blasting the MAKER generated fasta sequences to Swiss-prot. Thanks, Mike > On Mar 17, 2017, at 6:37 PM, Quanwei Zhang wrote: > > Hello: > > I have a questions about the assigning putative gene function by mapping to UniProt/Swiss-prot gene set (described in the protocol published in 2014). > Here, for each of the gene model from Maker, the pipeline will find the most similar protein in UniProt/Swiss-prot and assign the function of the matched protein, right? > It does not require best-reciprocal hit, right? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Mon Mar 20 12:09:28 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Mon, 20 Mar 2017 13:09:28 -0400 Subject: [maker-devel] evidence of transcript assembly Message-ID: Hello: I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? (1) All the Trinity transcript assemblies? (2) Trinity transcript assemblies that fully cover the mouse genes? (3) Trinity transcript assemblies either fully or partly cover the mouse genes? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Mon Mar 20 12:09:28 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Mon, 20 Mar 2017 13:09:28 -0400 Subject: [maker-devel] evidence of transcript assembly Message-ID: Hello: I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? (1) All the Trinity transcript assemblies? (2) Trinity transcript assemblies that fully cover the mouse genes? (3) Trinity transcript assemblies either fully or partly cover the mouse genes? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenna.kramer at utoronto.ca Mon Mar 20 20:37:45 2017 From: glenna.kramer at utoronto.ca (Glenna Kramer) Date: Tue, 21 Mar 2017 01:37:45 +0000 Subject: [maker-devel] GFF no longer valid after renaming genes Message-ID: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Hi there, I am hoping that you can give me some assistance with finishing up my maker annotated genome for submission. I have been able to rename the genes for GenBank submission - using Support Protocol 2 in the paper by Campbell et. al "Genome Annotation and Curation Using MAKER and MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39. (PMC4286374). I have also been able to use the Support Protocol 3 from that same paper to assign a putative gene function. However, I am running into problems when I am trying to convert the GFF file to the tbl format for submission. I have tried to use scripts from GAG (Genome Annotation Generator) and maker (gff32table). Both of these scripts work wonderfully on the gff originally output from maker, but do not work once I rename the genes for GenBank submission. When I feed my file into a gff validator it turns out that my gff is valid prior to renaming, but after I rename the gff is no longer valid. I have been trying to troubleshoot what is happening to my gff when I rename as in Support Protocol 2, but am stumped. Has anyone else out there had a similar issue? I would be very thankful for any insight that you can provide! Best, Glenna Not sure if this will be helpful, but here is an example gene from prior to renaming: ##gff-version 3 ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 And after renaming: ##gff-version 3 ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; The commands I used were: % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map %map_gff_ids myfilename.map myfilename.gff -------------- next part -------------- An HTML attachment was scrubbed... URL: From adf at ncgr.org Mon Mar 20 20:49:22 2017 From: adf at ncgr.org (Andrew Farmer) Date: Mon, 20 Mar 2017 19:49:22 -0600 Subject: [maker-devel] GFF no longer valid after renaming genes In-Reply-To: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> References: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Message-ID: <127be156-b2bd-574f-5187-9942f05220e2@ncgr.org> Hi Glenna- this may be totally off-base but I have a vague memory that some validators will complain about the semicolon after the last attribute in the column nine attribute list; it's not clear to me from the specification that this is truly illegal, but can imagine why a parser might not like to deal with it. In any case, you might try just removing that terminal semicolon character and see if that solves the validation complaint. but apologies in advance if my dim recollection has misled me into wasting your time... Andrew Farmer On 3/20/17 7:37 PM, Glenna Kramer wrote: > Hi there, > > I am hoping that you can give me some assistance with finishing up my > maker annotated genome for submission. I have been able to rename the > genes for GenBank submission - using Support Protocol 2 in the paper > by Campbell et. al "Genome Annotation and Curation Using MAKER and > MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39. > (PMC4286374). > I have also been able to use the Support Protocol 3 from that same > paper to assign a putative gene function. However, I am running into > problems when I am trying to convert the GFF file to the tbl format > for submission. I have tried to use scripts from GAG (Genome > Annotation Generator) and maker (gff32table). Both of these scripts > work wonderfully on the gff originally output from maker, but do not > work once I rename the genes for GenBank submission. When I feed my > file into a gff validator it turns out that my gff is valid prior to > renaming, but after I rename the gff is no longer valid. I have been > trying to troubleshoot what is happening to my gff when I rename as in > Support Protocol 2, but am stumped. Has anyone else out there had a > similar issue? I would be very thankful for any insight that you can > provide! > > Best, > Glenna > > Not sure if this will be helpful, but here is an example gene from > prior to renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 > ChromoV|quiver|quiver maker exon 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > > And after renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . > ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . > ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; > ChromoV|quiver|quiver maker exon 62081 62650 . + . > ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 > ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > > The commands I used were: > > % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map > > %map_gff_ids myfilename.map myfilename.gff > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- ...all concepts in which an entire process is semiotically concentrated elude definition; only that which has no history is definable. Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 11:15:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 10:15:20 -0600 Subject: [maker-devel] GFF no longer valid after renaming genes In-Reply-To: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> References: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Message-ID: <5DFD02E2-2C6F-49DA-90DE-9E17EE0A8CE2@gmail.com> The problem appears to be the multiple ?|? characters in your contig names (ChromoV|quiver|quiver). They end up in the gene ID, and since ?|? has a special meaning in perl, it creates weird replacement behavior. I?ve attached two scripts that will fix that. Use them to replace their counterparts in the ?/maker/bin/ and .../maker/src/bin/ directories, then rerun all renaming steps on a new gff3 (not the one you already tried to rename). Also you may want to consider changing IDs in the assembly itself before you release it or use it for analysis. You would want to remove the '|quiver|quiver? tail on every contig. That tail has the potential to open up hidden downstream analysis errors from other tools for the same reasons outlined above, since ?|? characters have special meaning. Thanks, Carson > On Mar 20, 2017, at 7:37 PM, Glenna Kramer wrote: > > Hi there, > > I am hoping that you can give me some assistance with finishing up my maker annotated genome for submission. I have been able to rename the genes for GenBank submission - using Support Protocol 2 in the paper by Campbell et. al "Genome Annotation and Curation Using MAKER and MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39.? (PMC4286374). I have also been able to use the Support Protocol 3 from that same paper to assign a putative gene function. However, I am running into problems when I am trying to convert the GFF file to the tbl format for submission. I have tried to use scripts from GAG (Genome Annotation Generator) and maker (gff32table). Both of these scripts work wonderfully on the gff originally output from maker, but do not work once I rename the genes for GenBank submission. When I feed my file into a gff validator it turns out that my gff is valid prior to renaming, but after I rename the gff is no longer valid. I have been trying to troubleshoot what is happening to my gff when I rename as in Support Protocol 2, but am stumped. Has anyone else out there had a similar issue? I would be very thankful for any insight that you can provide! > > Best, > Glenna > > Not sure if this will be helpful, but here is an example gene from prior to renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 > ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > > And after renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; > ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > > The commands I used were: > > % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map > > %map_gff_ids myfilename.map myfilename.gff > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map_fasta_ids Type: application/octet-stream Size: 1676 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map_gff_ids Type: application/octet-stream Size: 5048 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 12:00:06 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 11:00:06 -0600 Subject: [maker-devel] split genes In-Reply-To: References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: I have no suggestions, but maybe someone else on the list may have some. ?Carson > On Mar 17, 2017, at 11:49 AM, Quanwei Zhang wrote: > > Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. > > Best > Quanwei > > 2017-03-17 11:21 GMT-04:00 Carson Holt >: > MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. > > ?Carson > > > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > > > Thanks > > > > Best > > Quanwei > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 12:01:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 11:01:30 -0600 Subject: [maker-devel] evidence of transcript assembly In-Reply-To: References: Message-ID: <297B9C95-919E-4D4F-9103-1FED1550B745@gmail.com> Different sources of data will have different levels of quality. You may want to run them all, then look at results in a browser like Apollo. If specific source look like they are more problematic than others, then drop them. ?Carson > On Mar 20, 2017, at 11:09 AM, Quanwei Zhang wrote: > > Hello: > > I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? > (1) All the Trinity transcript assemblies? > (2) Trinity transcript assemblies that fully cover the mouse genes? > (3) Trinity transcript assemblies either fully or partly cover the mouse genes? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Mar 21 12:47:21 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 21 Mar 2017 17:47:21 +0000 Subject: [maker-devel] split genes In-Reply-To: References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: Just curious but have you tried scaffolding your assembly using your RNA-Seq de novo assembly data? We?ve seen some improvement with BUSCO calls and annotation after doing this using L_RNA_Scaffolder (though you do need to be a bit careful and try reducing your trx assembly down to a somewhat non-redundant set). chris From: maker-devel on behalf of Carson Holt Date: Tuesday, March 21, 2017 at 12:00 PM To: Quanwei Zhang Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] split genes I have no suggestions, but maybe someone else on the list may have some. ?Carson On Mar 17, 2017, at 11:49 AM, Quanwei Zhang > wrote: Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. Best Quanwei 2017-03-17 11:21 GMT-04:00 Carson Holt >: MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. ?Carson > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > Hello: > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Fri Mar 24 04:10:45 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Fri, 24 Mar 2017 10:10:45 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: HI! First of all thank your for previous help. Running Maker 2.31.9 with MPI (Intel) is running fine, if we use ONE node only. But, if we try to concatenate more than one node (e.g. 2 node a? 8 cores) we get this error: [...] ### Running Maker example MOAB_PROCCOUNT: 16 slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead STATUS: Parsing control files... Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. [...] /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 was created before and is EXISTING during the period of the job continuance. I attached the complete log to this e-mail. Again: THANK YOU VERY MUCH. All the best. -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- #!/bin/bash #MSUB -N maker-job #MSUB -j oe #MSUB -o $(JOBNAME).$(JOBID) #MSUB -m ae # -M given_name.family_name at your-uni.de #MSUB -l nodes=2:ppn=8 #MSUB -l mem=20gb #MSUB -l walltime=01:00:00 # start=$(date +%s) echo " " echo "### Setting up shell environment ..." echo " " # if test -e "/etc/profile"; then source "/etc/profile"; fi; if test -e "$HOME/.bash_profile"; then source "$HOME/.bash_profile"; fi; unset LANG; export LC_ALL="C"; export MKL_NUM_THREADS=1; export OMP_NUM_THREADS=1 export USER=${USER:=`logname`} export MOAB_JOBID=${MOAB_JOBID:=`date +%s`} export MOAB_SUBMITDIR=${MOAB_SUBMITDIR:=`pwd`} export MOAB_JOBNAME=${MOAB_JOBNAME:=`basename "$0"`} export MOAB_JOBNAME=$(echo "${MOAB_JOBNAME}" | sed 's/[^a-zA-Z0-9._-]/_/g') export MOAB_NODECOUNT=${MOAB_NODECOUNT:=1} export MOAB_PROCCOUNT=${MOAB_PROCCOUNT:=1} ulimit -s 200000 echo " " echo "### Printing basic job infos to stdout ..." echo " " echo "START_TIME = `date +'%y-%m-%d %H:%M:%S %s'`" echo "HOSTNAME = ${HOSTNAME}" echo "USER = ${USER}" echo "MOAB_JOBNAME = ${MOAB_JOBNAME}" echo "MOAB_JOBID = ${MOAB_JOBID}" echo "MOAB_SUBMITDIR = ${MOAB_SUBMITDIR}" echo "MOAB_NODECOUNT = ${MOAB_NODECOUNT}" echo "MOAB_PROCCOUNT = ${MOAB_PROCCOUNT}" echo "SLURM_NODELIST = ${SLURM_NODELIST}" echo "PBS_NODEFILE = ${PBS_NODEFILE}" if test -f "${PBS_NODEFILE}"; then echo "PBS_NODEFILE (begin) ---------------------------------" NO_NODES=$(wc -l < ${PBS_NODEFILE}) cat "${PBS_NODEFILE}" echo "PBS_NODEFILE (end) -----------------------------------" else NO_NODES=1 fi # ############################################################################## echo " " echo "### Creating TMP_WORK_DIR directory and changing to it ..." echo " " # Using "/tmp/$USER" should be ok for one node jobs. In case of multi-node jobs # it might be neccessary to modify TMP_BASE_DIR to point to SLURM_SUBMIT_DIR # or to create (and delete) TMP_WORK_DIR on each node (job-type dependent). # NEVER EVER calculate in your home directory. JOB_WORK_DIR="${SLURM_JOB_NAME}.uc1.${SLURM_JOB_ID%%.*}.$(date +%y%m%d_%H%M%S)" if test -z "$SLURM_NNODES" -o "$SLURM_NNODES" = "1" then TMP_BASE_DIR="/tmp/${USER}" else # in case of 2 or more nodes, use a common scratch dir available on all nodes... TMP_BASE_DIR="$SLURM_SUBMIT_DIR" fi TMP_WORK_DIR="${TMP_BASE_DIR}/${JOB_WORK_DIR}" echo "JOB_WORK_DIR = ${JOB_WORK_DIR}" echo "TMP_BASE_DIR = ${TMP_BASE_DIR}" echo "TMP_WORK_DIR cd = ${TMP_WORK_DIR}" mkdir -vp "${TMP_WORK_DIR}" && { cd "${TMP_WORK_DIR}"; pwd; } || { echo "ERROR: cd $TMP_WORK_DIR"; exit 1; } # Remarks: # * The job's temporary subdirectory JOB_WORK_DIR consists of SLURM_JOB_NAME # and SLURM_JOB_ID connected by ".uc1.". This is a little bit of magic since # the output file of your job follows the same rule. Therefore the # sorting of files belonging to one job will work nicely, when you # list the result files later in the submit directory (SLURM_SUBMIT_DIR). # * Using TMP_BASE_DIR="/tmp/$USER" is ok, if the job requires less # than 3.6 TB of node local disk space (for details see "www.bwhpc-c5.de"). # ############################################################################## echo " " echo "### Loading MAKER module:" echo " " module load bio/maker/2.31.9 [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } echo "MAKER_VERSION = $MAKER_VERSION" module list echo " " echo "### Copying input examples files for job:" echo " " cp -v ${MAKER_EXA_DIR}/*.{fasta,ctl} . sleep 2 echo " " echo "### Display internal Maker/bwHPC environments..." echo " " echo "MAKER_BIN_DIR = ${MAKER_BIN_DIR}" echo "MAKER_EXA_DIR = ${MAKER_EXA_DIR}" echo "" echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 # # Do NOT use mpiexec here. Unfortunately this crashes # "STATUS: Processing and indexing input FASTA files..." # exec.hydra -n 2 maker -h echo "MOAB_PROCCOUNT: ${MOAB_PROCCOUNT:=1}" # do NOT use mpiexec. use mpiexec.hydra or mpirun. # mpirun -n ${MOAB_PROCCOUNT} maker -h # mpirun -n ${MOAB_PROCCOUNT} maker 2>&1 >maker_$(date +%Y-%m-%d_%H:%M:%S).out mpirun -n ${MOAB_PROCCOUNT} maker echo "### Cleaning up files ... removing unnecessary scratch files ..." echo " " # rm -fv sleep 3 # Sleep some time so potential stale nfs handles can disappear. echo " " echo "### Compressing results and copying back result archive ..." echo " " cd "${TMP_BASE_DIR}" mkdir -vp "${MOAB_SUBMITDIR}" # if user has deleted or moved the submit dir echo "Creating result tgz-file '${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz' ..." tar -zcvf "${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz" "${JOB_WORK_DIR}" \ || { echo "ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR '$TMP_WORK_DIR' on host '$HOSTNAME' manually (if not done automatically by queueing system)."; exit 102; } # Remarks: # * The resulting tgz file is copied back to the submit directory. # The name of the tgz file looks similar too # "bwunicluster-maker-example.moab.275.110528_101755.tgz" echo " " echo "### Final cleanup: Remove TMP_WORK_DIR ..." echo " " rm -rvf "${TMP_WORK_DIR}" echo "END_TIME = `date +'%y-%m-%d %H:%M:%S %s'`" end=$(date +%s) echo " " echo "### Calculate duration ..." echo " " diff=$[end-start] if [ $diff -lt 60 ]; then echo "Runtime (approx.): '$diff' secs" elif [ $diff -ge 60 ]; then echo 'Runtime (approx.): '$[$diff / 60] 'min(s) '$[$diff % 60] 'secs' fi -------------- next part -------------- ### Setting up shell environment ... ### Printing basic job infos to stdout ... START_TIME = 17-03-24 04:35:21 1490326521 HOSTNAME = uc1n385 USER = kn_pop235844 MOAB_JOBNAME = maker-job MOAB_JOBID = 11658541 MOAB_SUBMITDIR = /pfs/work2/workspace/scratch/kn_pop235844-wstest-0 MOAB_NODECOUNT = 2 MOAB_PROCCOUNT = 16 SLURM_NODELIST = uc1n[385,397] PBS_NODEFILE = ### Creating TMP_WORK_DIR directory and changing to it ... JOB_WORK_DIR = maker-job.uc1.11658541.170324_043521 TMP_BASE_DIR = /tmp/kn_pop235844 TMP_WORK_DIR cd = /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521 mkdir: created directory '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521' /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521 ### Loading MAKER module: MAKER_VERSION = 2.31.9 Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) 3) bio/maker/2.31.9 ### Copying input examples files for job: '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_contig.fasta' -> './dpp_contig.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_est.fasta' -> './dpp_est.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_protein.fasta' -> './dpp_protein.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_bopts.ctl' -> './maker_bopts.ctl' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_exe.ctl' -> './maker_exe.ctl' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_opts.ctl' -> './maker_opts.ctl' ### Display internal Maker/bwHPC environments... MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.9/bin MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples ### Runing Maker example MOAB_PROCCOUNT: 16 slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521': No such file or directory: going to /tmp instead STATUS: Parsing control files... Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. ### Cleaning up files ... removing unnecessary scratch files ... ### Compressing results and copying back result archive ... Creating result tgz-file '/pfs/work2/workspace/scratch/kn_pop235844-wstest-0/maker-job.uc1.11658541.170324_043521.tgz' ... maker-job.uc1.11658541.170324_043521/ maker-job.uc1.11658541.170324_043521/dpp_contig.fasta maker-job.uc1.11658541.170324_043521/dpp_est.fasta maker-job.uc1.11658541.170324_043521/dpp_protein.fasta maker-job.uc1.11658541.170324_043521/maker_bopts.ctl maker-job.uc1.11658541.170324_043521/maker_exe.ctl maker-job.uc1.11658541.170324_043521/maker_opts.ctl maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/ maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log ### Final cleanup: Remove TMP_WORK_DIR ... removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_est.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_protein.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_bopts.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_exe.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_opts.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log' removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output' removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521' END_TIME = 17-03-24 04:36:08 1490326568 ### Calculate duration ... Runtime (approx.): '47' secs -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Fri Mar 24 10:00:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 Mar 2017 09:00:58 -0600 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: <2D6022EE-3AFC-4B87-99A3-2D310995A844@gmail.com> This error ?> slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead It is from SLURM and not from MAKER. It occurs before your job even started. It?s from the SLURM initialization of one of the nodes you are using. Note /tmp is not shared. It is independent on each node. So /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 may exist on one node, but not on the others. Since you are somehow setting this before you launch the job, SLURM is complaining because it doesn?t exist on one of the other nodes during initialization. So you need to review how you are launching things. ?Carson > On Mar 24, 2017, at 3:10 AM, Rainer Rutka wrote: > > HI! > First of all thank your for previous help. > Running Maker 2.31.9 with MPI (Intel) is running fine, if we > use ONE node only. > > But, if we try to concatenate more than one node (e.g. 2 node a? 8 > cores) we get this error: > > [...] > ### Running Maker example > > MOAB_PROCCOUNT: 16 > slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead > STATUS: Parsing control files... > Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. > [...] > > /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 > was created before and is EXISTING during the period of the > job continuance. > > I attached the complete log to this e-mail. > > Again: THANK YOU VERY MUCH. > > All the best. > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From carson.holt at genetics.utah.edu Wed Mar 29 13:12:35 2017 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 29 Mar 2017 18:12:35 +0000 Subject: [maker-devel] non-M gene models In-Reply-To: <59ca4391-d32e-bfa8-4118-8c9586f3dfe4@email.arizona.edu> References: <717138b6-fc7f-8f23-e550-c3019c4f96ec@email.arizona.edu> <59ca4391-d32e-bfa8-4118-8c9586f3dfe4@email.arizona.edu> Message-ID: <0AD41A2D-9CFE-48DE-B338-F15D3A590B30@genetics.utah.edu> Maybe. Those two options can result in a lot of partial models. Also setting always_complete=1 will help some. Models without M at the start are generally partial models. There is often something about the contig that keeps it from being a whole model (single basepair error breaks ORF or splice site, or a string of NNN?s overlap part of an exon). You can also try identifying InterPro domain and dropping any model without a defined domain (i.e. if it?s going to be partial, at least make sure it?s useful in its partial form). ?Carson On Mar 29, 2017, at 4:23 AM, Dario Copetti > wrote: Looking at the config file again I notice this: est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no I usually turn them on only to get models from ESTs to train Augustus and SNAP: do you think that having these parameters on during the final annotation will produce the non-M models? If so, do you think that re-running MAKER again with them turned off and using the MAKER-derived gff3 will clean out these models? Can you elaborate a bit more on the usage of these two parameters? Thanks, Dario On 3/29/2017 12:07 PM, Dario Copetti wrote: Hi Carson, We are ready to submit several different sets of annotations but we are now stuck with the issue of having models which protein sequence does not start with Met, and NCBI is picky about that. Below I paste an example of a genome we are working on: as you see, most (95%) of the models start with M, but a significant fraction (almost 1500 models!) does not. We used MAKER 2.31.8, specifying the option of having models that only start with M. We realize that this issue may not be easy to fix - and also that there are indeed isoforms that do not start with M - but how would you fix this? Within or outside MAKER I mean, any help will be appreciated. Some time ago, Josh and Sharon (cc'd) fixed the models by having the CDS start at the first M that was in frame with the exon, and wrote a script for that. Is this issue maybe fixed in a newer version of MAKER? How else would you fix it or deal with NCBI genomes people? Thanks, Dario grep -A1 ">" maker_proteins_161026.fasta | grep -v ">" | grep -v "\-\-" | cut -c1 | sort | uniq -c 106 A 33 C 69 D 88 E 53 F 94 G 34 H 86 I 77 K 144 L 28245 M 58 N 72 P 44 Q 95 R 142 S 80 T 114 V 29 W 6 X 53 Y -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From annabel.beichman at gmail.com Thu Mar 30 12:51:36 2017 From: annabel.beichman at gmail.com (Annabel Beichman) Date: Thu, 30 Mar 2017 10:51:36 -0700 Subject: [maker-devel] RepeatMasker masking olfactory receptors Message-ID: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> Hi Carson, I have a question about RepeatMasker within Maker ? I am finding that all class II olfactory receptors (families like OR2, OR5) are being masked by RepeatMasker as ?RTE-BovB? repeats. This leads to them not being annotated by Maker. I don?t expect my species (a mustelid) to have a large number of Bov-B repeats, and when I put the sequences annotated in my genome as RTE-BovB into repbase?s CENSOR only 13 out of 960 sequences have a hit to anything in repbase. If I put those same sequences into NCBI blast, however, they all blast to olfactory receptors. I am finding the same pattern with another related mustelid de novo genome, and took the Ensembl ferret genome and ran it through the same pipeline and am finding a large number of Bov-B repeats there as well, despite there being none in the official annotation of that genome. I used RepeatMasker with all species libraries, plus a custom library from RepeatModeler. Any idea what might be going on? Thanks so much! ~ Annabel From 4urelie.K at gmail.com Thu Mar 30 13:54:07 2017 From: 4urelie.K at gmail.com (Aurelie K) Date: Thu, 30 Mar 2017 12:54:07 -0600 Subject: [maker-devel] RepeatMasker masking olfactory receptors In-Reply-To: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> References: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> Message-ID: Hi Annabel, I would run RM by specifying your (group of) species, using the -s option of Repeat Masker, mostly if you have a custom de novo library. This will limit the cross masking of repeats that have been identified in other species. Cheers, Aurelie On 30 March 2017 at 11:51, Annabel Beichman wrote: > Hi Carson, > I have a question about RepeatMasker within Maker ? > I am finding that all class II olfactory receptors (families like OR2, > OR5) are being masked by RepeatMasker as ?RTE-BovB? repeats. This leads to > them not being annotated by Maker. I don?t expect my species (a mustelid) > to have a large number of Bov-B repeats, and when I put the sequences > annotated in my genome as RTE-BovB into repbase?s CENSOR only 13 out of 960 > sequences have a hit to anything in repbase. If I put those same sequences > into NCBI blast, however, they all blast to olfactory receptors. I am > finding the same pattern with another related mustelid de novo genome, and > took the Ensembl ferret genome and ran it through the same pipeline and am > finding a large number of Bov-B repeats there as well, despite there being > none in the official annotation of that genome. > > I used RepeatMasker with all species libraries, plus a custom library from > RepeatModeler. > > Any idea what might be going on? > > Thanks so much! > > ~ Annabel > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Wed Mar 1 05:30:39 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Wed, 1 Mar 2017 13:30:39 +0100 Subject: [maker-devel] Maker-Error when started with IMPI In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> Message-ID: <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> Hi Carson. Again THANK YOU for your efforts :-) Am 24.02.2017 um 18:30 schrieb Carson Holt: > Specific things. > > 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause problems with other MPI's. OK, I deleted this envirnoment. Not set any more. > 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has to be compiled for the flavor you are using, so make sure you have a separate installation of MAKER for Intel MPI). Also validate that the mpicc and libmpi.h listed during the MAKER install belong to Intel MPI. Don?t just assume they do because you loaded the module. Manually verify the paths during MAKER?s setup. I validated: UC:[kn at uc1n996 bwhpc-examples]$ module list Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) FOR MPICC: UC:[kn at uc1n996 bwhpc-examples]$ type mpicc mpicc is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc FOR LIBMPI: UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., because the SW was compiled and linkes without any errors or missing libs. > 3. The error you got previously should not even be possible with the current version of Intel MPI, > which is why I say that when you called mpiexec, something else (that was not Intel MPI) was launched. > Easy solution is to give the full path of mpiexec in your job, so are not relying on PATH to be unaltered in your job. mpiexec is in the PATH and the right one is/was used, too. MPIXEC: UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec mpiexec is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec UC:[kn at bwhpc-examples]$ > Do not do ?> mpiexec -nc 1 maker > Do this for example ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker OK, so i did: [...] #MSUB -l nodes=1:ppn=1 #MSUB -l mem=20gb [...] echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker [...] > 4. Build and run on the same node for your test. If you build on one node and run on another, you may > be changing your environment in ways you don?t realize that break things. So if you can build and test on > the same node and it works, then it fails when you test it elsewhere, then you have to track down how your > environment is changing. OK I did. Same node: uc1n996 UNFORTUNATELY I GOT THE SAME ERROR: [...] ### Runing Maker example LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libmpi.so OMPI_MCA_mpi_warn_on_fork=0 I_MPI_CPUINFO=proc I_MPI_PMI_LIBRARY=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libpmi.so I_MPI_PIN_DOMAIN=node I_MPI_FABRICS=shm:tcp I_MPI_HYDRA_IFACE=ib0 mpiexec_uc1n342.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n342.localdomain_kn_pop235844); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) [...] > ?Carson tbc. ? :-) THANX -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From rainer.rutka at uni-konstanz.de Wed Mar 1 05:51:05 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Wed, 1 Mar 2017 13:51:05 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> Message-ID: <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> Sorry, sent wrong e-mail :-( IGNORE THE FIRST MAIL I SENT! Am 01.03.2017 um 13:30 schrieb Rainer Rutka: Hi Carson. Again THANK YOU for your efforts :-) Am 24.02.2017 um 18:30 schrieb Carson Holt: > Specific things. > > 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause > problems with other MPI's. OK, I deleted this envirnoment. Not set any more. > 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has > to be compiled for the flavor you are using, so make sure you have a > separate installation of MAKER for Intel MPI). Also validate that the > mpicc and libmpi.h listed during the MAKER install belong to Intel > MPI. Don?t just assume they do because you loaded the module. Manually > verify the paths during MAKER?s setup. I validated: UC:[kn at uc1n996 bwhpc-examples]$ module list Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) FOR MPICC: UC:[kn at uc1n996 bwhpc-examples]$ type mpicc mpicc is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc FOR LIBMPI: UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., because the SW was compiled and linkes without any errors or missing libs. > 3. The error you got previously should not even be possible with the > current version of Intel MPI, > which is why I say that when you called mpiexec, something else (that > was not Intel MPI) was launched. > Easy solution is to give the full path of mpiexec in your job, so are > not relying on PATH to be unaltered in your job. mpiexec is in the PATH and the right one is/was used, too: MPIXEC: UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec mpiexec is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > Do not do ?> mpiexec -nc 1 maker > Do this for example ?> > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > -nc maker OK, so i did: [...] #MSUB -l nodes=1:ppn=1 #MSUB -l mem=20gb [...] echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker [...] > 4. Build and run on the same node for your test. If you build on one > node and run on another, you may > be changing your environment in ways you don?t realize that break > things. So if you can build and test on > the same node and it works, then it fails when you test it elsewhere, > then you have to track down how your > environment is changing. OK I did. Same node: uc1n996 UNFORTUNATELY I GOT THE SAME ERROR: [...] Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) 3) bio/maker/2.31.8_impi ### Display internal Maker/bwHPC environments... MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bin MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bwhpc-examples ### Runing Maker example OMPI_MCA_mpi_warn_on_fork=0 I_MPI_CPUINFO=proc I_MPI_PMI_LIBRARY=/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/lib/libmpi.so I_MPI_PIN_DOMAIN=node I_MPI_FABRICS=shm:tcp I_MPI_HYDRA_IFACE=ib0 mpiexec_uc1n326.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n326.localdomain_kn_pop235844); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) ### Cleaning up files ... removing unnecessary scratch files ... [...] > ?Carson tbc. ? :-) THANX -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Wed Mar 1 13:32:54 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 13:32:54 -0700 Subject: [maker-devel] SOBA statistics of Maker annotation In-Reply-To: <2377C5DD-569C-4248-B458-349D7AEA32F5@ucr.edu> References: <688EB172-FEC8-4995-8AA2-0925AF62201A@ucr.edu> <6551374B-54FF-4047-B7A8-A49327FC0036@gmail.com> <73526BAB-57F8-4A47-AADD-DB6883573EAB@ucr.edu> <2377C5DD-569C-4248-B458-349D7AEA32F5@ucr.edu> Message-ID: <6E776F59-F71F-49F7-872A-A0E404970C7E@gmail.com> Perhaps with the way you are counting sequence from the RepeatMasker report you are double counting for repeats that overlap? MAKER reports the command line it uses as part of its STDERR, so you can manually run any step you want outside of MAKER to evaluate. ?Carson > On Feb 25, 2017, at 10:14 AM, Qihua Liang wrote: > > Thank you Barry and Carson! > > I compared the SOBA statistics of RepeatMasker footprint and the report generated by running RepeatMasker alone, I got 2 different parentage of repeats masked. Running RepeatMasker with myTrained.lib, the repeats masked are 42%. But within Maker GFF3, the percentage of repeats masker is only ~18%. What may cause such difference here? > > Thanks > Qihua > >> On Feb 21, 2017, at 1:34 PM, Carson Holt wrote: >> >> MAKER merges overlapping RepeatMasker results into a single longer feature. >> >> ?Carson >> >> >>> On Feb 20, 2017, at 1:34 PM, Qihua Liang wrote: >>> >>> Hi Carson, >>> >>> Thanks for your reply! Now I understand the minimal length of SOBA analysis of Maker gene models in GFF3. >>> >>> I am also using SOBA to calculate the statistics of another sources in the GFF3 file, and I have found another strange thing about RepeatMasker annotation and footprint percentage. >>> >>> Previously, I ran RepeatMasker outside of Maker once, with my_trained.lib (same as used in Maker), and I had bases masked of ~42% from the output report. >>> In running Maker, I provided both ?model_org=all? and ?rmlib=my_trained.lib?. Under these setting, RepeatMasker should be run twice and the merged results of the twice running will be the output of RepeatMasker in GFF3. I am expecting the bases masked by RepeatMasker in the GFF3 will be more than 42%. >>> >>> But in SOBA calculation, the footprint percentage is only ~18%. Referring to the SOBA paper, footprint is calculated as "non-redundant nucleotide count of all features of a given type?. I assume that when SOBA calculates footprint of RepeatMasker features in GFF3, it should be counting the same as "masked bps" as RepeatMasker itself. >>> >>> When Maker ?combines? the 2 runs of RepeatMasker, is it a merge or an overlapping of 2 RepeatMasker results? >>> Besides, instead of using SOBA, are there any accessory scripts updated in Maker to calculate the statistics of the annotations? >>> >>> Thanks >>> Qihua >>> >>> >>>> On Feb 19, 2017, at 10:05 PM, Carson Holt wrote: >>>> >>>> IN GFF3 the CDS and UTR lengths are actually the merge of all CDSs or UTR features, but SOBA is reporting each part individually which may be causing your confusion. This is because SOBA reports per feature statistics and not merged feature statistics. >>>> >>>> CDS?s do not have to take up entire exons. For example start/stop codons may cross splice sites and be split across exons (very common). The result is that each part of the split CDS becomes a separate feature. As a result SOBA will treat each one separately. So a single bp CDS here is not abnormal, since the remaining part of the CDS continues on the next exon as a separate line. The exact same is true for UTR. >>>> >>>> If you want the merged length of the UTR and CDS, it is bets to pull that info out of the _QI= part of the GFF3 attributes for each mRNA. >>>> >>>> What about single bp exons? Those cannot occur unless you gave an input GFF3 with predictions that have single bp exons. The predictors like SNAP and Augustus just won?t produce them, with one exception. They can potentially produce them for the first/last exon. This is not because the exon is 1 bp, but rather because the predictor only reports the CDS part of the exon. As a result if the stop/start codon may have only 1 bp overlapping that exon, but one you add UTR the exon will extend from that point and will no longer be 1bp in length. But if the UTR never gets added, then you can be left with a partial initial/terminal exon. >>>> >>>> However more than likely what you are seeing is just related to how SOBA reports individual feature line stats as opposed to merged stats for CDS and UTR. >>>> >>>> Thanks, >>>> Carson >>>> >>>>> On Feb 18, 2017, at 9:43 AM, Qihua Liang wrote: >>>>> >>>>> Dear Maker develop team, >>>>> >>>>> I used SOBA website to calculate the statistics of Maker annotation, and I found out the length of some features of Maker, like CDS, exon, 5? and 3?UTR, the minimal length of such features can be as short as 1bp. These are confusing, with such features length of 1bp. When Maker combines different gene models and makes such predictions, how will it accept such abnormal exon/CDS length? And is there any parameters in the bopt.ctl or evm.ctl to avoid such abnormal gene models? >>>>> >>>>> Thanks >>>>> Qihua >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>> >> > From carsonhh at gmail.com Wed Mar 1 13:36:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 13:36:17 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: Message-ID: If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). An example MPI launch command for running on 200 CPUs on a cluster ?> mpiexec -n 200 maker 2> maker_mpi1.error ?Carson > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: > > Hello: > > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? > > The example in the protocol > % maker 2> maker1.error > % maker 2> maker2.error & > % maker 2> maker3.error & > ...... > > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Wed Mar 1 14:09:30 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 1 Mar 2017 16:09:30 -0500 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> Message-ID: Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? Best Quanwei 2017-03-01 15:54 GMT-05:00 Carson Holt : > If you split into separate files, you can use the -g option to select the > input file together with the -base option so all output goes to the same > directory. Because they technically have different input files, this will > avoid file locking issues. You have to use the -dsindex option at the end > to rebuild the datastore index, so it looks like a single job. But that is > one way to get around the issue. > > ?Carson > > > > On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: > > Thank you. But I met some problems with MPI on our server. So now I split > my contigs into several files and annotate those files separately. After I > finish the annotation on each file, I will merge the results. > > Thank you for your explanation! > > Best > Quanwei > > 2017-03-01 15:36 GMT-05:00 Carson Holt : > >> If you submit too many simultaneous, MAKER run then file locks will start >> to collide and one run will slow down the others. You should submit fewer >> simultaneous jobs and instead use MPI (maker must be configured and >> compiled to use MPI). >> >> An example MPI launch command for running on 200 CPUs on a cluster ?> >> mpiexec -n 200 maker 2> maker_mpi1.error >> >> ?Carson >> >> >> >> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang >> wrote: >> > >> > Hello: >> > >> > I am doing genome annotation using Maker on our high performance >> computational cluster (HPC). Due to some issues of MPI, I submitted the >> Maker jobs several times under the same directory to HPC. Followed by the >> example in the protocol (as shown below), when I submit the jobs I make >> them as background processes by "&" except the first one. Is this necessary >> when I submit a job to a HPC? I found it costed much much longer time than >> I expected (according to a testing on a smaller data set). I am not sure >> whether setting the process as background process lead to this issue? >> > >> > The example in the protocol >> > % maker 2> maker1.error >> > % maker 2> maker2.error & >> > % maker 2> maker3.error & >> > ...... >> > >> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 >> of the time that cost for annotation a 50000bp contig? I am using SNAP for >> an inito and RNA-seq assembly and protein sequences as evidence. I have >> more than half contigs shorter than 300bp (whose total length is only about >> 5% of the total length of all contigs), I want to know whether I can save >> about half (or only about 5%) of the time if I ignore those short contigs. >> > >> > Thanks >> > >> > Best >> > Quanwei >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 1 14:10:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 14:10:20 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> Message-ID: <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> That will work. ?Carson > On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: > > Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? > > Best > Quanwei > > 2017-03-01 15:54 GMT-05:00 Carson Holt >: > If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. > > ?Carson > > > >> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang > wrote: >> >> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >> >> Thank you for your explanation! >> >> Best >> Quanwei >> >> 2017-03-01 15:36 GMT-05:00 Carson Holt >: >> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >> >> An example MPI launch command for running on 200 CPUs on a cluster ?> >> mpiexec -n 200 maker 2> maker_mpi1.error >> >> ?Carson >> >> >> >> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang > wrote: >> > >> > Hello: >> > >> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >> > >> > The example in the protocol >> > % maker 2> maker1.error >> > % maker 2> maker2.error & >> > % maker 2> maker3.error & >> > ...... >> > >> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >> > >> > Thanks >> > >> > Best >> > Quanwei >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 1 17:43:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 17:43:30 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> Message-ID: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. The following may or may not work if the first two fail: Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h Also send me this file ?> perl/lib/MAKER/ConfigData.pm Thanks, Carson > On Mar 1, 2017, at 5:51 AM, Rainer Rutka wrote: > > > Sorry, sent wrong e-mail :-( > > IGNORE THE FIRST MAIL I SENT! > > Am 01.03.2017 um 13:30 schrieb Rainer Rutka: > Hi Carson. > Again THANK YOU for your efforts :-) > Am 24.02.2017 um 18:30 schrieb Carson Holt: >> Specific things. >> >> 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause >> problems with other MPI's. > > OK, I deleted this envirnoment. Not set any more. > >> 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has >> to be compiled for the flavor you are using, so make sure you have a >> separate installation of MAKER for Intel MPI). Also validate that the >> mpicc and libmpi.h listed during the MAKER install belong to Intel >> MPI. Don?t just assume they do because you loaded the module. Manually >> verify the paths during MAKER?s setup. > > I validated: > UC:[kn at uc1n996 bwhpc-examples]$ module list > Currently Loaded Modulefiles: > 1) compiler/intel/16.0(default) > 2) mpi/impi/5.1.3-intel-16.0(default) > FOR MPICC: > UC:[kn at uc1n996 bwhpc-examples]$ type mpicc > mpicc is > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc > FOR LIBMPI: > UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 > UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h > Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., > because the SW was compiled and linkes without any errors or missing libs. > >> 3. The error you got previously should not even be possible with the >> current version of Intel MPI, >> which is why I say that when you called mpiexec, something else (that >> was not Intel MPI) was launched. >> Easy solution is to give the full path of mpiexec in your job, so are >> not relying on PATH to be unaltered in your job. > > mpiexec is in the PATH and the right one is/was used, too: > MPIXEC: > UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec > mpiexec is > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > >> Do not do ?> mpiexec -nc 1 maker >> Do this for example ?> >> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec >> -nc maker > OK, so i did: > [...] > #MSUB -l nodes=1:ppn=1 > #MSUB -l mem=20gb > [...] > echo " " > echo "### Runing Maker example" > echo " " > export OMPI_MCA_mpi_warn_on_fork=0 > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > -nc maker > [...] > >> 4. Build and run on the same node for your test. If you build on one >> node and run on another, you may >> be changing your environment in ways you don?t realize that break >> things. So if you can build and test on >> the same node and it works, then it fails when you test it elsewhere, >> then you have to track down how your >> environment is changing. > > OK I did. Same node: uc1n996 > UNFORTUNATELY I GOT THE SAME ERROR: > [...] > Currently Loaded Modulefiles: > 1) compiler/intel/16.0(default) > 2) mpi/impi/5.1.3-intel-16.0(default) > 3) bio/maker/2.31.8_impi > > > ### Display internal Maker/bwHPC environments... > > MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bin > MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bwhpc-examples > > > ### Runing Maker example > OMPI_MCA_mpi_warn_on_fork=0 > I_MPI_CPUINFO=proc > I_MPI_PMI_LIBRARY=/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/lib/libmpi.so > I_MPI_PIN_DOMAIN=node > I_MPI_FABRICS=shm:tcp > I_MPI_HYDRA_IFACE=ib0 > mpiexec_uc1n326.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n326.localdomain_kn_pop235844); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > ### Cleaning up files ... removing unnecessary scratch files ... > [...] > >> ?Carson > tbc. ? :-) > THANX > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From rainer.rutka at uni-konstanz.de Thu Mar 2 01:41:37 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Thu, 2 Mar 2017 09:41:37 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: Hi Carson! Am 02.03.2017 um 01:43 schrieb Carson Holt: > Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h Same error(s). > If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. Yes, they fail. > The following may or may not work if the first two fail: > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello WORKS FINE! > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h WORKS! > Also send me this file ?> perl/lib/MAKER/ConfigData.pm Attached to this mail. > Thanks, > Carson -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: ConfigData.pm Type: application/x-perl Size: 5424 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From rainer.rutka at uni-konstanz.de Thu Mar 2 02:07:07 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Thu, 2 Mar 2017 10:07:07 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: <6cd0a8c5-e6a5-a171-5f80-11d193627aeb@uni-konstanz.de> > The following may or may not work if the first two fail: > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h mpirun, !mpiexec is running, too! -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Thu Mar 2 10:41:35 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Mar 2017 10:41:35 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. Thanks, Carson > On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: > > Hi Carson! > > Am 02.03.2017 um 01:43 schrieb Carson Holt: >> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h > Same error(s). > >> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. > Yes, they fail. > >> The following may or may not work if the first two fail: >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello > WORKS FINE! > >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h > WORKS! > >> Also send me this file ?> perl/lib/MAKER/ConfigData.pm > Attached to this mail. > >> Thanks, >> Carson > > -- > Rainer Rutka > University of Konstanz > Communication, Information, Media Centre (KIM) > * High-Performance-Computing (HPC) > * KIM-Support and -Base-Services > Room: V511 > 78457 Konstanz, Germany > +49 7531 88-5413 > From mnaymik at tgen.org Thu Mar 2 13:05:22 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Thu, 2 Mar 2017 13:05:22 -0700 Subject: [maker-devel] ThrowNullPointerException() Message-ID: I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Now starting the contig!! SeqID: 5239 Length: 1395 #--------------------------------------------------------------------- Error: NCBI C++ Exception: "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 2 13:25:59 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Mar 2017 13:25:59 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: Message-ID: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Try reinstalling blast, or upgrade to a newer version of blast. ?Carson > On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: > > > I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? > > > > MAKER WARNING: All old files will be erased before continuing > > #--------------------------------------------------------------------- > > Now starting the contig!! > > SeqID: 5239 > > Length: 1395 > > #--------------------------------------------------------------------- > > > > > > Error: NCBI C++ Exception: > > "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti > > > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Fri Mar 3 09:48:34 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Fri, 3 Mar 2017 16:48:34 +0000 Subject: [maker-devel] how to deal with Contigs to run maker? In-Reply-To: <2017022815435664227911@cau.edu.cn> References: <2017022815435664227911@cau.edu.cn> Message-ID: <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> Hi Chao, I don?t think merging the contigs is a good idea. Unless you actually know the distances (in basepairs) between the contigs, this could lead to many spurious alignments. I think you should leave them separate in your fasta file for both repeatmodeler, ab-initio training and running maker. If you?re worried about short contigs in your assembly, you can exclude shorter contigs with the min_contig option in the maker_opts control file. ~Daniel On Feb 28, 2017, at 2:43 AM, dcg at cau.edu.cn wrote: Dear sir: After assemblying, I got many contigs and their order in each chromosome. What I have done is merging these contigs into each chromosomes followed by the order, with 100 Ns inserted betwwen each contigs. So that I got chr1 chr2......Then I ran the repeatmodeler, predictor to annotate it. Could my way reach a high-quality result? Should I use all the contigs to mask repeats and practice predictor? Is there any better way to do genome-wide annotation? I'm looking forward to your reply! Best wishes! Chao Chao ________________________________ 2017.02.28 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 3 10:32:15 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 3 Mar 2017 10:32:15 -0700 Subject: [maker-devel] how to deal with Contigs to run maker? In-Reply-To: <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> References: <2017022815435664227911@cau.edu.cn> <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> Message-ID: <7CF3A765-5A93-42B2-AA28-4596CD25A459@gmail.com> I agree. Also a 100bp insert of N?s will essentially be ignored by aligners and predictors. They?ll jump across it as if it was just an intron, resulting in false merges and bad predictions. ?Carson > On Mar 3, 2017, at 9:48 AM, Ence,daniel wrote: > > Hi Chao, I don?t think merging the contigs is a good idea. Unless you actually know the distances (in basepairs) between the contigs, this could lead to many spurious alignments. I think you should leave them separate in your fasta file for both repeatmodeler, ab-initio training and running maker. If you?re worried about short contigs in your assembly, you can exclude shorter contigs with the min_contig option in the maker_opts control file. > > ~Daniel > > >> On Feb 28, 2017, at 2:43 AM, dcg at cau.edu.cn wrote: >> >> Dear sir: >> After assemblying, I got many contigs and their order in each chromosome. >> What I have done is merging these contigs into each chromosomes followed by the order, with 100 Ns inserted betwwen each contigs. So that I got chr1 chr2......Then I ran the repeatmodeler, predictor to annotate it. >> >> Could my way reach a high-quality result? Should I use all the contigs to mask repeats and practice predictor? >> Is there any better way to do genome-wide annotation? >> >> I'm looking forward to your reply! >> Best wishes! >> >> Chao Chao >> 2017.02.28 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Mon Mar 6 01:21:20 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Mon, 6 Mar 2017 09:21:20 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: Hi Carson. Again thank you for your response. But - sorry to say - it's not possible our MPI is corrupt. We have approx. 1.500 users working on our bwUniCluster so far. 95 % of these users use MPI. And: All our other software (see: cis-hpc.uni-konstanz.de ) is running with our implementations of IMPI/OMPI without any issues. :-() Am 02.03.2017 um 18:41 schrieb Carson Holt: > This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello > > All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. > > The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). > > > So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. > > Thanks, > Carson > > > >> On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: >> >> Hi Carson! >> >> Am 02.03.2017 um 01:43 schrieb Carson Holt: >>> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >> Same error(s). >> >>> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. >> Yes, they fail. >> >>> The following may or may not work if the first two fail: >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello >> WORKS FINE! >> >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >> WORKS! >> >>> Also send me this file ?> perl/lib/MAKER/ConfigData.pm >> Attached to this mail. >> >>> Thanks, >>> Carson >> >> -- >> Rainer Rutka >> University of Konstanz >> Communication, Information, Media Centre (KIM) >> * High-Performance-Computing (HPC) >> * KIM-Support and -Base-Services >> Room: V511 >> 78457 Konstanz, Germany >> +49 7531 88-5413 >> > -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Mon Mar 6 07:47:51 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 6 Mar 2017 07:47:51 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: <9B00FB6A-B5F5-4240-AB1E-4CBEEEB63C7F@gmail.com> I was able to replicate the error as so ?> 1. Intel MPI installed on CentOS kernel 6 (MPI works fine) 2. Upgrade to kernel 7 without reinstalling and Intel MPI reports the same error as reported by the user. 3. After recompiling Intel MPI on kernel 7 the error goes away. The proof that there is an issue with your Intel MPI installation is in this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello That command is simply trying to get mpiexec to launch ?echo Hello? internally. And it failed. It?s as simple as that. Thanks, Carson > On Mar 6, 2017, at 1:21 AM, Rainer Rutka wrote: > > > Hi Carson. > > Again thank you for your response. > > But - sorry to say - it's not possible our MPI is corrupt. > We have approx. 1.500 users working on our bwUniCluster so far. 95 % > of these users use MPI. And: All our other software (see: > > cis-hpc.uni-konstanz.de ) > > is running with our implementations of IMPI/OMPI without any > issues. > > :-() > > > Am 02.03.2017 um 18:41 schrieb Carson Holt: >> This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >> >> All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. >> >> The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). >> >> >> So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. >> >> Thanks, >> Carson >> >> >> >>> On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: >>> >>> Hi Carson! >>> >>> Am 02.03.2017 um 01:43 schrieb Carson Holt: >>>> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >>> Same error(s). >>> >>>> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. >>> Yes, they fail. >>> >>>> The following may or may not work if the first two fail: >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello >>> WORKS FINE! >>> >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >>> WORKS! >>> >>>> Also send me this file ?> perl/lib/MAKER/ConfigData.pm >>> Attached to this mail. >>> >>>> Thanks, >>>> Carson >>> >>> -- >>> Rainer Rutka >>> University of Konstanz >>> Communication, Information, Media Centre (KIM) >>> * High-Performance-Computing (HPC) >>> * KIM-Support and -Base-Services >>> Room: V511 >>> 78457 Konstanz, Germany >>> +49 7531 88-5413 >>> >> > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From dussert.yann at gmail.com Mon Mar 6 09:51:59 2017 From: dussert.yann at gmail.com (YannDussert) Date: Mon, 6 Mar 2017 17:51:59 +0100 Subject: [maker-devel] Differences in non_overlapping protein file between runs Message-ID: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Hello, First, thank you for developing MAKER, this is a great annotation tool! I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? Thank you in advance for your answer. Best regards, Yann From dcg at cau.edu.cn Sun Mar 5 04:26:59 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Sun, 5 Mar 2017 19:26:59 +0800 Subject: [maker-devel] For help about masking repeats before annotation Message-ID: <2017030519265949065818@cau.edu.cn> Dear sir: Before the maker opeations, I do repeat masking first on my contigs. However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : NOTE: RepeatScout did not return any models. Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? Hope to hear from you. Best wishes! Chao Chao 2017.03.05 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Mon Mar 6 05:24:17 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Mon, 6 Mar 2017 20:24:17 +0800 Subject: [maker-devel] How to merge the annotation results into chromosomes? Message-ID: <2017030620241723514513@cau.edu.cn> Dear sir: Hello, I am doing my utmost to sdudy on annotation now. However, I have been confused on results handlng recently. After alignment, practice and curation, we can get good gene model and merge them by gff_merge and fasta_merge. But how can I merge them into different chromosomes like Homo_sapiens.GRCh38.87.chromosome.11.gff3.gz? I don't just want results of different contigs. I'm looking forward to your reply. Thanks a lot! Best wishes! Chao Chao 2017.03.06 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucys-world at mailbox.org Mon Mar 6 07:40:33 2017 From: lucys-world at mailbox.org (lucys-world at mailbox.org) Date: Mon, 6 Mar 2017 15:40:33 +0100 (CET) Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP Message-ID: <850873370.6534.1488811234072@office.mailbox.org> Dear maker-devel group, I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. after two maker runs I wanted to create a new HMM for the third maker run, but the command fathom genome.ann genoma.dna -gene-stats resulted in 0 genes. What have I done so far: * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 * I was able to create an HMM based on all merged *.gff But these were not many: o out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: o 280 sequences 0.458676 avg GC fraction (min=0.338014 max=0.708052) 7445 genes (plus=3192 minus=4253) 1621 (0.217730) single-exon 5824 (0.782270) multi-exon 168.412018 mean exon (min=1 max=5224) 1464.349243 mean intron (min=30 max=41197) * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. o the gene-stats for the output of the second maker run are: o 282 sequences 0.473125 avg GC fraction (min=0.338014 max=0.725131) 0 genes (plus=0 minus=0) 0 (-nan) single-exon 0 (-nan) multi-exon -nan mean exon (min=2147483647 max=0) -nan mean intron (min=2147483647 max=0) Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) Thank you for your time and help kind regards Lucy -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Mon Mar 6 10:11:57 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Mon, 6 Mar 2017 17:11:57 +0000 Subject: [maker-devel] How to merge the annotation results into chromosomes? In-Reply-To: <2017030620241723514513@cau.edu.cn> References: <2017030620241723514513@cau.edu.cn> Message-ID: <45D1390D-212D-42A4-9819-C0045601B013@mail.ufl.edu> Hi, Do you have data that can precisely place each of your contigs in their position on the chromosome? Without that, this isn?t even possible, since a gff3 file with the chromosomes instead of the contigs requires each contig?s position in the chromosome. And in any case, I don?t think there is a script in the maker tools that does what you?re asking. Maybe someone else has made a script to do that. ~Daniel On Mar 6, 2017, at 7:24 AM, dcg at cau.edu.cn wrote: Dear sir: Hello, I am doing my utmost to sdudy on annotation now. However, I have been confused on results handlng recently. After alignment, practice and curation, we can get good gene model and merge them by gff_merge and fasta_merge. But how can I merge them into different chromosomes like Homo_sapiens.GRCh38.87.chromosome.11.gff3.gz? I don't just want results of different contigs. I'm looking forward to your reply. Thanks a lot! Best wishes! Chao Chao ________________________________ 2017.03.06 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Mon Mar 6 10:15:07 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Mon, 6 Mar 2017 17:15:07 +0000 Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <850873370.6534.1488811234072@office.mailbox.org> References: <850873370.6534.1488811234072@office.mailbox.org> Message-ID: <970801D9-536E-494C-B5C7-F5F72125FAFC@mail.ufl.edu> Hi Lucy, What were your settings for the second training run? Did you leave protein2genome=1? ~Daniel On Mar 6, 2017, at 9:40 AM, lucys-world at mailbox.org wrote: Dear maker-devel group, I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. after two maker runs I wanted to create a new HMM for the third maker run, but the command fathom genome.ann genoma.dna -gene-stats resulted in 0 genes. What have I done so far: * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 * I was able to create an HMM based on all merged *.gff But these were not many: * out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: * 280 sequences 0.458676 avg GC fraction (min=0.338014 max=0.708052) 7445 genes (plus=3192 minus=4253) 1621 (0.217730) single-exon 5824 (0.782270) multi-exon 168.412018 mean exon (min=1 max=5224) 1464.349243 mean intron (min=30 max=41197) * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. * the gene-stats for the output of the second maker run are: * 282 sequences 0.473125 avg GC fraction (min=0.338014 max=0.725131) 0 genes (plus=0 minus=0) 0 (-nan) single-exon 0 (-nan) multi-exon -nan mean exon (min=2147483647 max=0) -nan mean intron (min=2147483647 max=0) Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) Thank you for your time and help kind regards Lucy _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 6 12:48:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 6 Mar 2017 12:48:49 -0700 Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <850873370.6534.1488811234072@office.mailbox.org> References: <850873370.6534.1488811234072@office.mailbox.org> Message-ID: <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> It looks like you have no genes to train with. So you did something wrong on your second run. Either no gene predictor was running or you provided no evidence for the predictor, so you produced no models. ?Carson > On Mar 6, 2017, at 7:40 AM, lucys-world at mailbox.org wrote: > > Dear maker-devel group, > > > > I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. > > after two maker runs I wanted to create a new HMM for the third maker run, but the command > > > > fathom genome.ann genoma.dna -gene-stats > > > > resulted in 0 genes. > > > > What have I done so far: > > for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 > > > I was able to create an HMM based on all merged *.gff But these were not many: > out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: > 280 sequences > 0.458676 avg GC fraction (min=0.338014 max=0.708052) > 7445 genes (plus=3192 minus=4253) > 1621 (0.217730) single-exon > 5824 (0.782270) multi-exon > 168.412018 mean exon (min=1 max=5224) > 1464.349243 mean intron (min=30 max=41197) > > > For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. > the gene-stats for the output of the second maker run are: > 282 sequences > 0.473125 avg GC fraction (min=0.338014 max=0.725131) > 0 genes (plus=0 minus=0) > 0 (-nan) single-exon > 0 (-nan) multi-exon > -nan mean exon (min=2147483647 max=0) > -nan mean intron (min=2147483647 max=0) > > > Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) > > > > Thank you for your time and help > > kind regards > > Lucy > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Mar 7 08:14:11 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 7 Mar 2017 10:14:11 -0500 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Hi Carson: I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. Thanks 2017-03-01 16:10 GMT-05:00 Carson Holt : > That will work. > > ?Carson > > On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: > > Thank you. I have submit my jobs to our server. What I plan to do is like > this: (1) split contigs into 50 files; (2) for each contig file, I > collected the annotation into gff and protein sequences into fasta format; > (3) manually merge the 50 gff files and protein sequences files. Is what I > am doing also correct? > > Best > Quanwei > > 2017-03-01 15:54 GMT-05:00 Carson Holt : > >> If you split into separate files, you can use the -g option to select the >> input file together with the -base option so all output goes to the same >> directory. Because they technically have different input files, this will >> avoid file locking issues. You have to use the -dsindex option at the end >> to rebuild the datastore index, so it looks like a single job. But that is >> one way to get around the issue. >> >> ?Carson >> >> >> >> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >> >> Thank you. But I met some problems with MPI on our server. So now I >> split my contigs into several files and annotate those files separately. >> After I finish the annotation on each file, I will merge the results. >> >> Thank you for your explanation! >> >> Best >> Quanwei >> >> 2017-03-01 15:36 GMT-05:00 Carson Holt : >> >>> If you submit too many simultaneous, MAKER run then file locks will >>> start to collide and one run will slow down the others. You should submit >>> fewer simultaneous jobs and instead use MPI (maker must be configured and >>> compiled to use MPI). >>> >>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>> mpiexec -n 200 maker 2> maker_mpi1.error >>> >>> ?Carson >>> >>> >>> >>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang >>> wrote: >>> > >>> > Hello: >>> > >>> > I am doing genome annotation using Maker on our high performance >>> computational cluster (HPC). Due to some issues of MPI, I submitted the >>> Maker jobs several times under the same directory to HPC. Followed by the >>> example in the protocol (as shown below), when I submit the jobs I make >>> them as background processes by "&" except the first one. Is this necessary >>> when I submit a job to a HPC? I found it costed much much longer time than >>> I expected (according to a testing on a smaller data set). I am not sure >>> whether setting the process as background process lead to this issue? >>> > >>> > The example in the protocol >>> > % maker 2> maker1.error >>> > % maker 2> maker2.error & >>> > % maker 2> maker3.error & >>> > ...... >>> > >>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 >>> of the time that cost for annotation a 50000bp contig? I am using SNAP for >>> an inito and RNA-seq assembly and protein sequences as evidence. I have >>> more than half contigs shorter than 300bp (whose total length is only about >>> 5% of the total length of all contigs), I want to know whether I can save >>> about half (or only about 5%) of the time if I ignore those short contigs. >>> > >>> > Thanks >>> > >>> > Best >>> > Quanwei >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>> ell-lab.org >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 7 08:35:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 7 Mar 2017 08:35:42 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Use gff3_merge again without the -d option. Just give it all 50 files. --Carson Sent from my iPhone > On Mar 7, 2017, at 8:14 AM, Quanwei Zhang wrote: > > Hi Carson: > > I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? > (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. > (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. > > Thanks > > > > 2017-03-01 16:10 GMT-05:00 Carson Holt : >> That will work. >> >> ?Carson >> >>> On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: >>> >>> Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? >>> >>> Best >>> Quanwei >>> >>> 2017-03-01 15:54 GMT-05:00 Carson Holt : >>>> If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. >>>> >>>> ?Carson >>>> >>>> >>>> >>>>> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >>>>> >>>>> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >>>>> >>>>> Thank you for your explanation! >>>>> >>>>> Best >>>>> Quanwei >>>>> >>>>> 2017-03-01 15:36 GMT-05:00 Carson Holt : >>>>>> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >>>>>> >>>>>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>>>>> mpiexec -n 200 maker 2> maker_mpi1.error >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: >>>>>> > >>>>>> > Hello: >>>>>> > >>>>>> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >>>>>> > >>>>>> > The example in the protocol >>>>>> > % maker 2> maker1.error >>>>>> > % maker 2> maker2.error & >>>>>> > % maker 2> maker3.error & >>>>>> > ...... >>>>>> > >>>>>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >>>>>> > >>>>>> > Thanks >>>>>> > >>>>>> > Best >>>>>> > Quanwei >>>>>> > _______________________________________________ >>>>>> > maker-devel mailing list >>>>>> > maker-devel at box290.bluehost.com >>>>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 7 08:35:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 7 Mar 2017 08:35:42 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Use gff3_merge again without the -d option. Just give it all 50 files. --Carson Sent from my iPhone > On Mar 7, 2017, at 8:14 AM, Quanwei Zhang wrote: > > Hi Carson: > > I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? > (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. > (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. > > Thanks > > > > 2017-03-01 16:10 GMT-05:00 Carson Holt : >> That will work. >> >> ?Carson >> >>> On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: >>> >>> Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? >>> >>> Best >>> Quanwei >>> >>> 2017-03-01 15:54 GMT-05:00 Carson Holt : >>>> If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. >>>> >>>> ?Carson >>>> >>>> >>>> >>>>> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >>>>> >>>>> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >>>>> >>>>> Thank you for your explanation! >>>>> >>>>> Best >>>>> Quanwei >>>>> >>>>> 2017-03-01 15:36 GMT-05:00 Carson Holt : >>>>>> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >>>>>> >>>>>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>>>>> mpiexec -n 200 maker 2> maker_mpi1.error >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: >>>>>> > >>>>>> > Hello: >>>>>> > >>>>>> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >>>>>> > >>>>>> > The example in the protocol >>>>>> > % maker 2> maker1.error >>>>>> > % maker 2> maker2.error & >>>>>> > % maker 2> maker3.error & >>>>>> > ...... >>>>>> > >>>>>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >>>>>> > >>>>>> > Thanks >>>>>> > >>>>>> > Best >>>>>> > Quanwei >>>>>> > _______________________________________________ >>>>>> > maker-devel mailing list >>>>>> > maker-devel at box290.bluehost.com >>>>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisi.hahni at gmail.com Tue Mar 7 17:51:00 2017 From: chrisi.hahni at gmail.com (Christoph Hahn) Date: Wed, 8 Mar 2017 01:51:00 +0100 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> Message-ID: <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> Hi MAKER community, I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! cheers, Christoph On 10/02/2015 17:56, Carson Holt wrote: > I ran a few est2genome runs with a cufflinks file i just generated and > did not get any issues for EST based gene models. > > I?d like to at least have your test set to see if I can duplicate what > you are seeing. > > Use this to upload the job files then I can just run it from my server > here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > ?Carson > > >> On Feb 3, 2015, at 11:13 AM, Jason Gallant > > wrote: >> >> Hi Folks, >> >> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been >> checking the output files, and have noticed that none of my RNAseq >> data was incorporated on the run. I used Cufflinks to perform >> alignments of libraries from several tissues, ran the accessory >> script cufflinks2gff3 for each tissue, then concatenated the >> resulting gff3 files. I even ran the accessory script gff3merge to >> check that the resulting file was properly formatted. >> >> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only >> get protein2genome and repeatmasker evidence in my resulting maker >> gff3 file, and the genes predicted by these. Is there another option >> that I need to enable in order to use my est_gff file? I?m trying to >> get a set of genes to train the predictors for my next step. >> >> Any help would (as always) be greatly appreciated! >> >> Best, >> Jason Gallant >> >> ? >> Dr. Jason R. Gallant >> Assistant Professor >> Room 38 Natural Sciences >> Department of Zoology >> Michigan State University >> East Lansing, MI 48824 >> jgallant at msu.edu >> office: 517-884-7756 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Thu Mar 9 02:36:27 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 09:36:27 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing Message-ID: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> Hi all, I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. Thank you. Sincerely, Ole From carsonhh at gmail.com Thu Mar 9 10:52:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 10:52:30 -0700 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Message-ID: My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. ?Carson > On Mar 6, 2017, at 9:51 AM, YannDussert wrote: > > Hello, > > First, thank you for developing MAKER, this is a great annotation tool! > > I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. > > After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? > > Thank you in advance for your answer. > > Best regards, > Yann > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 9 11:39:11 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 11:39:11 -0700 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> Message-ID: <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> Jason never responded back to this one or uploaded his file to test. He probably figured it out off list. My guess is that your results are too fragmented to build a model that can pass filtering thresholds with. If you want I can take a look. You can upload all files for a test job here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi ?Carson > On Mar 7, 2017, at 5:51 PM, Christoph Hahn wrote: > > Hi MAKER community, > > I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. > > I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! > > cheers, > Christoph > > On 10/02/2015 17:56, Carson Holt wrote: >> I ran a few est2genome runs with a cufflinks file i just generated and did not get any issues for EST based gene models. >> >> I?d like to at least have your test set to see if I can duplicate what you are seeing. >> >> Use this to upload the job files then I can just run it from my server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> ?Carson >> >> >>> On Feb 3, 2015, at 11:13 AM, Jason Gallant > wrote: >>> >>> Hi Folks, >>> >>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been checking the output files, and have noticed that none of my RNAseq data was incorporated on the run. I used Cufflinks to perform alignments of libraries from several tissues, ran the accessory script cufflinks2gff3 for each tissue, then concatenated the resulting gff3 files. I even ran the accessory script gff3merge to check that the resulting file was properly formatted. >>> >>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only get protein2genome and repeatmasker evidence in my resulting maker gff3 file, and the genes predicted by these. Is there another option that I need to enable in order to use my est_gff file? I?m trying to get a set of genes to train the predictors for my next step. >>> >>> Any help would (as always) be greatly appreciated! >>> >>> Best, >>> Jason Gallant >>> >>> ? >>> Dr. Jason R. Gallant >>> Assistant Professor >>> Room 38 Natural Sciences >>> Department of Zoology >>> Michigan State University >>> East Lansing, MI 48824 >>> jgallant at msu.edu >>> office: 517-884-7756 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 9 11:51:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 11:51:25 -0700 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> Message-ID: <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure ?Carson > On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: > > Hi all, > I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. > > First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? > > I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. > > Thank you. > > Sincerely, > Ole > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucys-world at mailbox.org Tue Mar 7 01:39:40 2017 From: lucys-world at mailbox.org (lucys-world at mailbox.org) Date: Tue, 7 Mar 2017 09:39:40 +0100 (CET) Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> References: <850873370.6534.1488811234072@office.mailbox.org> <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> Message-ID: <1407048207.7112.1488875981292@office.mailbox.org> Hallo Carson, hello Daniel, thank you for your fast reply and help. To Daniels question: Yes unfortunately I had protein2genome=1 in all runs To Carson: After reading a lot through the forum I figured that I had a mistake in understanding an initio gene prediction. I thought one had to perform 3 maker run in total. One training run and then two maker runs for annotation. But now I think there are only two maker in to perform in total (one training and then one annotation run) is that correct? So after my first run I created an HMM based on the first gene-stats (with 7445 genes) and performed my second run with this HMM. Then I tried to create a new HMM based on my second run output. I think that is not necessary since the output of the second run should be my annotated genome? I think I have to redo my maker runs and for that have to questions regarding the maker_opts.ctl: 1. Training run: For that I have to give maker my genome, my evidence (in my Case Busco and Swissport data sets) and set protein2genome=1 . Since that is my only evidence I don't change anything else? (I don't add anything in the gene prediction paragraph?) 2. Annotation run: With the gff output of the training run I create my own HMM from SNAP. In the maker_opts.ctl I then add for this annotation run my SNAP-HMM and set AugustusSpecies on the closest related species (as recommended in the Augustus manual), is that correct? Do I give also my Protein evidence as I did in the Trainingsrun? Thank you very much for your time and help with that ! - Lucy > Carson Holt hat am 6. M?rz 2017 um 20:48 geschrieben: > > It looks like you have no genes to train with. So you did something wrong on your second run. Either no gene predictor was running or you provided no evidence for the predictor, so you produced no models. > > ?Carson > > > > > > On Mar 6, 2017, at 7:40 AM, lucys-world at mailbox.org mailto:lucys-world at mailbox.org wrote: > > > > > > Dear maker-devel group, > > > > > > I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. > > > > after two maker runs I wanted to create a new HMM for the third maker run, but the command > > > > > > fathom genome.ann genoma.dna -gene-stats > > > > > > resulted in 0 genes. > > > > > > What have I done so far: > > > > * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 > > > > > > * I was able to create an HMM based on all merged *.gff But these were not many: > > o out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: > > o 280 sequences > > 0.458676 avg GC fraction (min=0.338014 max=0.708052) > > 7445 genes (plus=3192 minus=4253) > > 1621 (0.217730) single-exon > > 5824 (0.782270) multi-exon > > 168.412018 mean exon (min=1 max=5224) > > 1464.349243 mean intron (min=30 max=41197) > > > > > > * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. > > o the gene-stats for the output of the second maker run are: > > o 282 sequences > > 0.473125 avg GC fraction (min=0.338014 max=0.725131) > > 0 genes (plus=0 minus=0) > > 0 (-nan) single-exon > > 0 (-nan) multi-exon > > -nan mean exon (min=2147483647 max=0) > > -nan mean intron (min=2147483647 max=0) > > > > > > Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) > > > > > > Thank you for your time and help > > > > kind regards > > > > Lucy > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com mailto:maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Thu Mar 9 12:42:31 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:42:31 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> Message-ID: <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> Hi Carson. In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? So I guess this version is not available, or maybe they meant 3.0beta1 or something. ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. Thank you. Ole > On 09 Mar 2017, at 19:51, Carson Holt wrote: > > Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. > > It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). > > I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. > > It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure > > ?Carson > > > >> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >> >> Hi all, >> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >> >> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >> >> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >> >> Thank you. >> >> Sincerely, >> Ole >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Thu Mar 9 12:50:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 12:50:10 -0700 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> Message-ID: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. ?Carson > On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: > > Hi Carson. > > In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? > > So I guess this version is not available, or maybe they meant 3.0beta1 or something. > > ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. > > Thank you. > > Ole > >> On 09 Mar 2017, at 19:51, Carson Holt wrote: >> >> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >> >> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >> >> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >> >> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >> >> ?Carson >> >> >> >>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>> >>> Hi all, >>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>> >>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>> >>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>> >>> Thank you. >>> >>> Sincerely, >>> Ole >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From o.k.torresen at ibv.uio.no Thu Mar 9 12:55:00 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:55:00 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> Message-ID: Ah, thank you. That explains it. Ole > On 09 Mar 2017, at 20:50, Carson Holt wrote: > > My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. > > ?Carson > > >> On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: >> >> Hi Carson. >> >> In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? >> >> So I guess this version is not available, or maybe they meant 3.0beta1 or something. >> >> ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. >> >> Thank you. >> >> Ole >> >>> On 09 Mar 2017, at 19:51, Carson Holt wrote: >>> >>> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >>> >>> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >>> >>> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >>> >>> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >>> >>> ?Carson >>> >>> >>> >>>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>>> >>>> Hi all, >>>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>>> >>>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>>> >>>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> Ole >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From o.k.torresen at ibv.uio.no Thu Mar 9 12:59:35 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:59:35 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> Message-ID: <0B73432A-E0EE-4983-8314-E8A94AADA74F@ibv.uio.no> Ah, thank you. That explains it. Ole > On 09 Mar 2017, at 20:50, Carson Holt wrote: > > My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. > > ?Carson > > >> On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: >> >> Hi Carson. >> >> In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? >> >> So I guess this version is not available, or maybe they meant 3.0beta1 or something. >> >> ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. >> >> Thank you. >> >> Ole >> >>> On 09 Mar 2017, at 19:51, Carson Holt wrote: >>> >>> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >>> >>> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >>> >>> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >>> >>> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >>> >>> ?Carson >>> >>> >>> >>>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>>> >>>> Hi all, >>>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>>> >>>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>>> >>>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> Ole >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From chrisi.hahni at gmail.com Fri Mar 10 01:50:52 2017 From: chrisi.hahni at gmail.com (Christoph Hahn) Date: Fri, 10 Mar 2017 09:50:52 +0100 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> Message-ID: <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> Dear Carson, Thanks for getting in touch! I actually managed in the end. I converted the gtf I had from cufflinks to gff3 via the script 'gtf2gff.pl' from augustus and then used the script 'gffGetmRNA.pl' again from augustus to extract the mRNA in fasta. This file I fed to MAKER via the 'est=' route and now I get plenty of est2genome evidence in the maker result. So the problem seems to be limited to the route 'est_gff=', allthough there is no error message whatsoever the est2genome routine seems to never be triggered. I'd still be happy to upload my data (the cufflinks gff, the genome fasta, anything else?) if you want to try to reproduce the problem. Let me know! btw I seem to be unable to create a new topic or respond to topics via google groups. Is the list closed or the access restricted somehow. I only managed by responding to Jason's mail which I still had in my inbox directly via my gmail. Thanks! cheers, Christoph On 09/03/2017 19:39, Carson Holt wrote: > Jason never responded back to this one or uploaded his file to test. > He probably figured it out off list. My guess is that your results are > too fragmented to build a model that can pass filtering thresholds with. > > If you want I can take a look. You can upload all files for a test job > here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > ?Carson > > > >> On Mar 7, 2017, at 5:51 PM, Christoph Hahn > > wrote: >> >> Hi MAKER community, >> >> I think I am seeing the same issue that Jason has reported. ran >> cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER >> via 'est_gff=' with 'est2genome=1'. In the resulting gff file from >> maker I only get protein2genome and repeatmasker evidence. If I do a >> search in the maker log est2genome never comes up. Tried to extract >> the cufflinks results as fasta and feed to MAKER via 'est='. Still no >> indication that the evidence is used. >> >> I am using MAKER 2.31.8. Any help would be much appreciated! Thanks >> in advance for your time! >> >> cheers, >> Christoph >> >> On 10/02/2015 17:56, Carson Holt wrote: >>> I ran a few est2genome runs with a cufflinks file i just generated >>> and did not get any issues for EST based gene models. >>> >>> I?d like to at least have your test set to see if I can duplicate >>> what you are seeing. >>> >>> Use this to upload the job files then I can just run it from my >>> server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> ?Carson >>> >>> >>>> On Feb 3, 2015, at 11:13 AM, Jason Gallant >>> > wrote: >>>> >>>> Hi Folks, >>>> >>>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been >>>> checking the output files, and have noticed that none of my RNAseq >>>> data was incorporated on the run. I used Cufflinks to perform >>>> alignments of libraries from several tissues, ran the accessory >>>> script cufflinks2gff3 for each tissue, then concatenated the >>>> resulting gff3 files. I even ran the accessory script gff3merge to >>>> check that the resulting file was properly formatted. >>>> >>>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only >>>> get protein2genome and repeatmasker evidence in my resulting maker >>>> gff3 file, and the genes predicted by these. Is there another >>>> option that I need to enable in order to use my est_gff file? I?m >>>> trying to get a set of genes to train the predictors for my next step. >>>> >>>> Any help would (as always) be greatly appreciated! >>>> >>>> Best, >>>> Jason Gallant >>>> >>>> ? >>>> Dr. Jason R. Gallant >>>> Assistant Professor >>>> Room 38 Natural Sciences >>>> Department of Zoology >>>> Michigan State University >>>> East Lansing, MI 48824 >>>> jgallant at msu.edu >>>> office: 517-884-7756 >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dussert.yann at gmail.com Fri Mar 10 03:53:36 2017 From: dussert.yann at gmail.com (YannDussert) Date: Fri, 10 Mar 2017 11:53:36 +0100 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Message-ID: <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> Hi, Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round. I can't see any problem with it, it looks like this: Plvit001 augustus_masked match 66626 70338 0.85 + . ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 Plvit001 augustus_masked match_part 66626 67586 0.85 + . ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961 Plvit001 augustus match 66626 70338 1 + . ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1 Plvit001 augustus match_part 66626 70096 1 + . ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471 Plvit001 augustus_masked match_part 68166 68486 0.85 + . ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321 Plvit001 augustus_masked match_part 69504 70096 0.85 + . ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593 Plvit001 augustus_masked match_part 70174 70338 0.85 + . ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165 Best regards, Yann On 09/03/2017 18:52, Carson Holt wrote: > My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. > > ?Carson > > >> On Mar 6, 2017, at 9:51 AM, YannDussert wrote: >> >> Hello, >> >> First, thank you for developing MAKER, this is a great annotation tool! >> >> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. >> >> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? >> >> Thank you in advance for your answer. >> >> Best regards, >> Yann >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ereboperezsilva at gmail.com Fri Mar 10 04:05:29 2017 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Fri, 10 Mar 2017 12:05:29 +0100 Subject: [maker-devel] ERROR: Chunk failed Message-ID: Hi! I'm having some trouble understanding the ERROR I'm receiving. Recently I've set up a new machine to work annotate a genome (around 2 Gb big) using Maker. We mounted a new disk of 1Tb and loaded there the files of a uncomplete run of annotation (we started it in a different machine and move it to this one, which had more precessing power). Apparently everything was ok, until somewhen yesterday we received the next ERROR: examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig4633 We are running 16 jobs of maker at the same time, on the unsplitted genome. We checked and "df" command returned that only 7% os the mounted disk was used. So the space does not appear to be the problem... Why that error then? Thanks for the help. Jos? Mar?a Gonz?lez P?rez-Silva. PhD student at Universidad de Oviedo. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ereboperezsilva at gmail.com Fri Mar 10 10:21:38 2017 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Fri, 10 Mar 2017 18:21:38 +0100 Subject: [maker-devel] Maker ERROR Message-ID: Hi, I wrote early this day, in reference to a problem of (apparently) space. After I deleted some unnecesary files (despite having plenty of storage left), I killed all the processes, and set 'clean_try=1' as recomended in this post . Before re-running the processes, we checked that there were no limitation over the size of a directory or something similar. After re-running, at first, all seemed correct, but when I re-checked some time after, I found out a lot of contigs with the status FAILED without folder specification in the '_master_datastore_index.log', looking like: Contig480 FAILED > Contig496 FAILED Contig512 FAILED Contig528 FAILED Contig544 FAILED Contig560 FAILED? But checking the 'nohub.out' of every proccess (16 in total, as the machine has 16 cores), I notice that each run is, from time to time, processing the contig correctly. So, after several (a lot) of FAILED contigs, it process one correctly. As said in the previous email, the ERROR dispolayed in the nohup.out is (including the last part of a processed contig at the beguinning): ? > #--------- command -------------# Widget::blastx: /usr/bin/blastall -p blastx -d > /data/ge/tmp/maker_VfDQQU/hsap_ensembl%2Efa.mpi.10.6 -i > /data/ge/tmp/maker_VfDQQU/0/Contig20.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y > 500000000 -a 4 -U -F T -I T -o > /data/ge/round3/cg.maker.output/cg_datastore/56/AC/Contig20//theVoid.Contig20/0/Contig20.0.hsap_ensembl%2Efa.blastx.temp_dir/hsap_ensembl%2Efa.mpi.10.6.blastx #-------------------------------# deleted:511 hits doing blastx of proteins open3: fork failed: Cannot allocate memory at > /home/jmgps/software/maker/bin/../lib/File/NFSLock.pm line 1037. --> rank=NA, hostname=Planarian2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig20 > ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig20 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig22 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig24 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig26 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig28? I'm totally lost here, I think it is still processing contigs, but the FAILED attemps slow down the whole process, and we are in a hurry due to the maintenance of the machine. And I can't understand the source of the ERROR. I will be more than happy to provide more details about the problem, if requested. Thanks a lot for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 10 10:34:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 Mar 2017 10:34:34 -0700 Subject: [maker-devel] Maker ERROR In-Reply-To: References: Message-ID: Several things. 1. MAKER does a lot of it?s work in a temporary directory (usually /tmp). This directory must be locally mounted and cannot be a network mounted location. If this location is full you can get issues. 2. MAKER needs at least 1GB of RAM per process (2-3GB is safer), so if you don?t have enough RAM you may need to run fewer processes (with MPI multiply whatever you supplied to the mpiexec -n flag by 1GB). 3. If you are launching MAKER multiple times as opposed to launching once via MPI, you will exacerbate the above limitations as well as open up IO limitations. MAKER can and does saturate IO when run multiple times simultaneously (this is especially true for network mounted locations). If you run via MPI you can greatly reduce IO, so make sure you are using MPI and not just launching MAKER multiple times. If you absolutely have to start multiple jobs, you can reduce IO somewhat by splitting the input fasta into pieces (use fasta_tool). Give a separate piece to each job via maker?s -g flag, and set -base so all results from all jobs get written to the same location. Then each job can avoid multiple file locks that would have been encountered by sharing input. Note that you must rebuild the datastore index using 'maker -dsindex? when all jobs complete. ?Carson > On Mar 10, 2017, at 10:21 AM, Jos? M? G. Perez-Silva wrote: > > Hi, > > I wrote early this day, in reference to a problem of (apparently) space. After I deleted some unnecesary files (despite having plenty of storage left), I killed all the processes, and set 'clean_try=1' as recomended in this post . Before re-running the processes, we checked that there were no limitation over the size of a directory or something similar. > > After re-running, at first, all seemed correct, but when I re-checked some time after, I found out a lot of contigs with the status FAILED without folder specification in the '_master_datastore_index.log', looking like: > > Contig480 FAILED > Contig496 FAILED > Contig512 FAILED > Contig528 FAILED > Contig544 FAILED > Contig560 FAILED? > > But checking the 'nohub.out' of every proccess (16 in total, as the machine has 16 cores), I notice that each run is, from time to time, processing the contig correctly. So, after several (a lot) of FAILED contigs, it process one correctly. As said in the previous email, the ERROR dispolayed in the nohup.out is (including the last part of a processed contig at the beguinning): > > ?#--------- command -------------# > Widget::blastx: > /usr/bin/blastall -p blastx -d /data/ge/tmp/maker_VfDQQU/hsap_ensembl%2Efa.mpi.10.6 -i /data/ge/tmp/maker_VfDQQU/0/Contig20.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 4 -U -F T -I T -o /data/ge/round3/cg.maker.output/cg_datastore/56/AC/Contig20//theVoid.Contig20/0/Contig20.0.hsap_ensembl%2Efa.blastx.temp_dir/hsap_ensembl%2Efa.mpi.10.6.blastx > #-------------------------------# > deleted:511 hits > doing blastx of proteins > open3: fork failed: Cannot allocate memory at /home/jmgps/software/maker/bin/../lib/File/NFSLock.pm line 1037. > --> rank=NA, hostname=Planarian2 > ERROR: Failed while doing blastx of proteins > ERROR: Chunk failed at level:8, tier_type:3 > FAILED CONTIG:Contig20 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:Contig20 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig22 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig24 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig26 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig28? > > I'm totally lost here, I think it is still processing contigs, but the FAILED attemps slow down the whole process, and we are in a hurry due to the maintenance of the machine. And I can't understand the source of the ERROR. > > I will be more than happy to provide more details about the problem, if requested. > > Thanks a lot for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 10:16:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 10:16:25 -0600 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> Message-ID: <9EC90572-7E3F-4B07-9098-6CAFD7B3A4B0@gmail.com> I see you have both masked and unmasked augustus calls, so you may have a lot of non-masked predictions in your second run that are entirely contained in transposons and repeat regions (that is why they do not overlap). Really the easiest thing to do would be to open the results in a browser, find one of the ones listed as non-overlapping, and then look at it to see why it is not overlapping. You can then look at that specific location directly in the file as needed, but it will be much easier to interpret looking at the features drawn in a browser (like Apollo - desktop version). ?Carson > On Mar 10, 2017, at 3:53 AM, YannDussert wrote: > > Hi, > > Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round. > > I can't see any problem with it, it looks like this: > > Plvit001 augustus_masked match 66626 70338 0.85 + . ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 > Plvit001 augustus_masked match_part 66626 67586 0.85 + . ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961 > Plvit001 augustus match 66626 70338 1 + . ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1 > Plvit001 augustus match_part 66626 70096 1 + . ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471 > Plvit001 augustus_masked match_part 68166 68486 0.85 + . ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321 > Plvit001 augustus_masked match_part 69504 70096 0.85 + . ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593 > Plvit001 augustus_masked match_part 70174 70338 0.85 + . ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165 > > > Best regards, > > Yann > > On 09/03/2017 18:52, Carson Holt wrote: >> My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. >> >> ?Carson >> >> >>> On Mar 6, 2017, at 9:51 AM, YannDussert wrote: >>> >>> Hello, >>> >>> First, thank you for developing MAKER, this is a great annotation tool! >>> >>> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. >>> >>> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? >>> >>> Thank you in advance for your answer. >>> >>> Best regards, >>> Yann >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 10:17:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 10:17:58 -0600 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> Message-ID: Sure. Send me the file. On a side note, I find cufflinks results to be very noisy (lot?s of false positives). I usually get better results using assembled reads from Trinity (with -jaccard_clip option set), or using Stringtie. Thanks, Carson > On Mar 10, 2017, at 1:50 AM, Christoph Hahn wrote: > > Dear Carson, > > Thanks for getting in touch! I actually managed in the end. I converted the gtf I had from cufflinks to gff3 via the script 'gtf2gff.pl' from augustus and then used the script 'gffGetmRNA.pl' again from augustus to extract the mRNA in fasta. This file I fed to MAKER via the 'est=' route and now I get plenty of est2genome evidence in the maker result. So the problem seems to be limited to the route 'est_gff=', allthough there is no error message whatsoever the est2genome routine seems to never be triggered. > > I'd still be happy to upload my data (the cufflinks gff, the genome fasta, anything else?) if you want to try to reproduce the problem. Let me know! > > btw I seem to be unable to create a new topic or respond to topics via google groups. Is the list closed or the access restricted somehow. I only managed by responding to Jason's mail which I still had in my inbox directly via my gmail. > > Thanks! > > cheers, > Christoph > > On 09/03/2017 19:39, Carson Holt wrote: >> Jason never responded back to this one or uploaded his file to test. He probably figured it out off list. My guess is that your results are too fragmented to build a model that can pass filtering thresholds with. >> >> If you want I can take a look. You can upload all files for a test job here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> ?Carson >> >> >> >>> On Mar 7, 2017, at 5:51 PM, Christoph Hahn > wrote: >>> >>> Hi MAKER community, >>> >>> I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. >>> >>> I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! >>> >>> cheers, >>> Christoph >>> >>> On 10/02/2015 17:56, Carson Holt wrote: >>>> I ran a few est2genome runs with a cufflinks file i just generated and did not get any issues for EST based gene models. >>>> >>>> I?d like to at least have your test set to see if I can duplicate what you are seeing. >>>> >>>> Use this to upload the job files then I can just run it from my server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>> >>>> ?Carson >>>> >>>> >>>>> On Feb 3, 2015, at 11:13 AM, Jason Gallant > wrote: >>>>> >>>>> Hi Folks, >>>>> >>>>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been checking the output files, and have noticed that none of my RNAseq data was incorporated on the run. I used Cufflinks to perform alignments of libraries from several tissues, ran the accessory script cufflinks2gff3 for each tissue, then concatenated the resulting gff3 files. I even ran the accessory script gff3merge to check that the resulting file was properly formatted. >>>>> >>>>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only get protein2genome and repeatmasker evidence in my resulting maker gff3 file, and the genes predicted by these. Is there another option that I need to enable in order to use my est_gff file? I?m trying to get a set of genes to train the predictors for my next step. >>>>> >>>>> Any help would (as always) be greatly appreciated! >>>>> >>>>> Best, >>>>> Jason Gallant >>>>> >>>>> ? >>>>> Dr. Jason R. Gallant >>>>> Assistant Professor >>>>> Room 38 Natural Sciences >>>>> Department of Zoology >>>>> Michigan State University >>>>> East Lansing, MI 48824 >>>>> jgallant at msu.edu >>>>> office: 517-884-7756 >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Tue Mar 14 11:29:49 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Tue, 14 Mar 2017 10:29:49 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: setting up GFF3 output and fasta chunks doing blastn of ESTs running blast search. #--------- command -------------# Widget::blastn: /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah #-------------------------------# Error: NCBI C++ Exception: "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr Error: NCBI C++ Exception: "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr examining contents of the fasta file and run log ERROR: BLASTN failed --> rank=87, hostname=pnap-pe7-s09 ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:6537645 ERROR: BLASTN failed --> rank=88, hostname=pnap-pe7-s09 ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:6537659 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:6537645 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:6537659 On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt wrote: > Try reinstalling blast, or upgrade to a newer version of blast. > > ?Carson > > > On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: > > > I have maker running with MPI and I get this error over and over again for > every contig. Any Ideas? > > > MAKER WARNING: All old files will be erased before continuing > > #--------------------------------------------------------------------- > > Now starting the contig!! > > SeqID: 5239 > > Length: 1395 > > #--------------------------------------------------------------------- > > > > Error: NCBI C++ Exception: > > "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Criti > > > > *This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged, > including patient health information. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution or use of the contents of this message is strictly prohibited. > If you have received this message in error or are not the named recipient, > please notify us immediately by contacting the sender at the electronic > mail address noted above, and delete and destroy all copies of this > message. Thank you.* > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 11:36:07 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 11:36:07 -0600 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: The error itself is coming from BLAST. MAKER does provide the command used, so you can try it outside of MAKER. You can submit the files used as well as command used to the BLAST developers for them to test with. MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you can stop it from deleting files. Edit line 58 by setting CLEANUP => 0 Then you should be able to grab whatever files maker used to run blast, and copy the blast command used from STDERR. ?Carson > On Mar 14, 2017, at 11:29 AM, Marcus Naymik wrote: > > I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: > > setting up GFF3 output and fasta chunks > > doing blastn of ESTs > > running blast search. > > #--------- command -------------# > > Widget::blastn: > > /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah > > #-------------------------------# > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr > > > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr > > > > examining contents of the fasta file and run log > > ERROR: BLASTN failed > > --> rank=87, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537645 > > > > ERROR: BLASTN failed > > --> rank=88, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537659 > > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537645 > > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537659 > > > > > On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt > wrote: > Try reinstalling blast, or upgrade to a newer version of blast. > > ?Carson > > >> On Mar 2, 2017, at 1:05 PM, Marcus Naymik > wrote: >> >> >> I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? >> >> >> >> MAKER WARNING: All old files will be erased before continuing >> >> #--------------------------------------------------------------------- >> >> Now starting the contig!! >> >> SeqID: 5239 >> >> Length: 1395 >> >> #--------------------------------------------------------------------- >> >> >> >> >> >> Error: NCBI C++ Exception: >> >> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti >> >> >> >> >> >> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Mar 14 20:27:10 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Mar 2017 22:27:10 -0400 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <2017030519265949065818@cau.edu.cn> References: <2017030519265949065818@cau.edu.cn> Message-ID: <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> Hi Chao Chao, I?ve not run into this before. Could you post the RepeatModeler command you used? Thanks, Mike > On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: > > Dear sir: > Before the maker opeations, I do repeat masking first on my contigs. > However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. > > When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : > NOTE: RepeatScout did not return any models. > > Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? > > Hope to hear from you. > Best wishes! > > Chao Chao > 2017.03.05 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Wed Mar 15 08:26:15 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Wed, 15 Mar 2017 22:26:15 +0800 Subject: [maker-devel] How to get Pseudogene Message-ID: <2017031522261575294011@cau.edu.cn> Dear sir: I'd like to mask some pseudogene to my annotation. How can I do it? In the guide, the first step is "Run a tblastn of the protein sequence (query) vs. the intergenic genome sequence (subject/database)" My question is: What do the " protein sequence and the intergenic genome sequence " refer to seperately? My own protein database? How to use the result in maker annotation? Best wishes! Chao Chao 2017.03.15 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Mar 15 09:00:13 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 15 Mar 2017 11:00:13 -0400 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <201703152048212561203@cau.edu.cn> References: <2017030519265949065818@cau.edu.cn> <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> <201703152048212561203@cau.edu.cn> Message-ID: <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> Hi Chao Chao, I?m not sure how to trouble shoot this if there were no error messages. I?ve ccd a couple of people that have worked with this protocol much more than I have. Ning and Kevin, Do you have any tips for running these tools that may help Chao Chao? Thanks, Mike > On Mar 15, 2017, at 8:48 AM, dcg at cau.edu.cn wrote: > > Thank for your reply! > I just followed the guide iat http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced > > To use LTRHarvest, my command is as below(the filename was set for my favor) > DIR1/gt suffixerator -db seqfile -indexname seqfileindex -tis -suf -lcp -des -ssp ?dna > DIR1/gt ltrharvest -index seqfileindex -out seqfile.out99 -outinner seqfile.outinner99 -gff3 seqfile.gff99 -minlenltr 100 \ > -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > seqfile.result99 > No error, but no results as well > > Chao Chao > 2017.03.15 > > From: Michael Campbell > Date: 2017-03-15 10:27 > To: dcg > CC: maker-devel > Subject: Re: [maker-devel] For help about masking repeats before annotation > Hi Chao Chao, > > I?ve not run into this before. Could you post the RepeatModeler command you used? > > Thanks, > Mike >> On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: >> >> Dear sir: >> Before the maker opeations, I do repeat masking first on my contigs. >> However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. >> >> When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : >> NOTE: RepeatScout did not return any models. >> >> Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? >> >> Hope to hear from you. >> Best wishes! >> >> Chao Chao >> 2017.03.05 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Wed Mar 15 10:54:48 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Wed, 15 Mar 2017 09:54:48 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: Thanks, you're right. I had to recompile blast from src with this flag: -std=c++0x On Tue, Mar 14, 2017 at 10:36 AM, Carson Holt wrote: > The error itself is coming from BLAST. MAKER does provide the command > used, so you can try it outside of MAKER. You can submit the files used as > well as command used to the BLAST developers for them to test with. > > MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you > can stop it from deleting files. > > Edit line 58 by setting CLEANUP => 0 > > Then you should be able to grab whatever files maker used to run blast, > and copy the blast command used from STDERR. > > ?Carson > > > > On Mar 14, 2017, at 11:29 AM, Marcus Naymik wrote: > > I have now tried with multiple versions of blast (2.6 and 2.28 binaries > and built from source) and get the same error: > > setting up GFF3 output and fasta chunks > > doing blastn of ESTs > > running blast search. > > #--------- command -------------# > > Widget::blastn: > > /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db > /scratch/mnaymik/maker/tmp/maker_cah > > #-------------------------------# > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Cr > > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Cr > > > examining contents of the fasta file and run log > > ERROR: BLASTN failed > > --> rank=87, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537645 > > > ERROR: BLASTN failed > > --> rank=88, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537659 > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537645 > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537659 > > > > On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt wrote: > >> Try reinstalling blast, or upgrade to a newer version of blast. >> >> ?Carson >> >> >> On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: >> >> >> I have maker running with MPI and I get this error over and over again >> for every contig. Any Ideas? >> >> >> MAKER WARNING: All old files will be erased before continuing >> >> #--------------------------------------------------------------------- >> >> Now starting the contig!! >> >> SeqID: 5239 >> >> Length: 1395 >> >> #--------------------------------------------------------------------- >> >> >> >> Error: NCBI C++ Exception: >> >> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", >> line 925: Criti >> >> >> >> *This electronic message is intended to be for the use only of the named >> recipient, and may contain information that is confidential or privileged, >> including patient health information. If you are not the intended >> recipient, you are hereby notified that any disclosure, copying, >> distribution or use of the contents of this message is strictly prohibited. >> If you have received this message in error or are not the named recipient, >> please notify us immediately by contacting the sender at the electronic >> mail address noted above, and delete and destroy all copies of this >> message. Thank you.* >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > *This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged, > including patient health information. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution or use of the contents of this message is strictly prohibited. > If you have received this message in error or are not the named recipient, > please notify us immediately by contacting the sender at the electronic > mail address noted above, and delete and destroy all copies of this > message. Thank you.* > > > -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 15 11:00:18 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 15 Mar 2017 11:00:18 -0600 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: <6A6C819F-D903-401A-8522-29FEBC955F17@gmail.com> Glad I could help. Remember to switch back CLEANUP => 1 if you set it to 0 to debug. Otherwise you will have a lot of files left in /tmp after each MAKER run. ?Carson > On Mar 15, 2017, at 10:54 AM, Marcus Naymik wrote: > > Thanks, you're right. I had to recompile blast from src with this flag: -std=c++0x > > On Tue, Mar 14, 2017 at 10:36 AM, Carson Holt > wrote: > The error itself is coming from BLAST. MAKER does provide the command used, so you can try it outside of MAKER. You can submit the files used as well as command used to the BLAST developers for them to test with. > > MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you can stop it from deleting files. > > Edit line 58 by setting CLEANUP => 0 > > Then you should be able to grab whatever files maker used to run blast, and copy the blast command used from STDERR. > > ?Carson > > > >> On Mar 14, 2017, at 11:29 AM, Marcus Naymik > wrote: >> >> I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: >> >> setting up GFF3 output and fasta chunks >> >> doing blastn of ESTs >> >> running blast search. >> >> #--------- command -------------# >> >> Widget::blastn: >> >> /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah >> >> #-------------------------------# >> >> Error: NCBI C++ Exception: >> >> "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr >> >> >> >> Error: NCBI C++ Exception: >> >> "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr >> >> >> >> examining contents of the fasta file and run log >> >> ERROR: BLASTN failed >> >> --> rank=87, hostname=pnap-pe7-s09 >> >> ERROR: Failed while doing blastn of ESTs >> >> ERROR: Chunk failed at level:0, tier_type:3 >> >> FAILED CONTIG:6537645 >> >> >> >> ERROR: BLASTN failed >> >> --> rank=88, hostname=pnap-pe7-s09 >> >> ERROR: Failed while doing blastn of ESTs >> >> ERROR: Chunk failed at level:0, tier_type:3 >> >> FAILED CONTIG:6537659 >> >> >> >> ERROR: Chunk failed at level:4, tier_type:0 >> >> FAILED CONTIG:6537645 >> >> >> >> ERROR: Chunk failed at level:4, tier_type:0 >> >> FAILED CONTIG:6537659 >> >> >> >> >> On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt > wrote: >> Try reinstalling blast, or upgrade to a newer version of blast. >> >> ?Carson >> >> >>> On Mar 2, 2017, at 1:05 PM, Marcus Naymik > wrote: >>> >>> >>> I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? >>> >>> >>> >>> MAKER WARNING: All old files will be erased before continuing >>> >>> #--------------------------------------------------------------------- >>> >>> Now starting the contig!! >>> >>> SeqID: 5239 >>> >>> Length: 1395 >>> >>> #--------------------------------------------------------------------- >>> >>> >>> >>> >>> >>> Error: NCBI C++ Exception: >>> >>> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti >>> >>> >>> >>> >>> >>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >> > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangn at msu.edu Wed Mar 15 09:56:30 2017 From: jiangn at msu.edu (Jiang, Ning) Date: Wed, 15 Mar 2017 15:56:30 +0000 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> References: <2017030519265949065818@cau.edu.cn> <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> <201703152048212561203@cau.edu.cn>, <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> Message-ID: Hi Chao Chao, I guess you have an extra "\" in your second command. We put that sign there to indicate the entire thing belong to one command (it is too long to put in one row). I suggest you remove the "\" and try again. Good luck! Ning Jiang ________________________________ From: Michael Campbell Sent: Wednesday, March 15, 2017 11:00:13 AM To: dcg at cau.edu.cn Cc: maker-devel; Jiang, Ning; Kevin Childs Subject: Re: [maker-devel] For help about masking repeats before annotation Hi Chao Chao, I?m not sure how to trouble shoot this if there were no error messages. I?ve ccd a couple of people that have worked with this protocol much more than I have. Ning and Kevin, Do you have any tips for running these tools that may help Chao Chao? Thanks, Mike On Mar 15, 2017, at 8:48 AM, dcg at cau.edu.cn wrote: Thank for your reply! I just followed the guide iat http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced To use LTRHarvest, my command is as below(the filename was set for my favor) DIR1/gt suffixerator -db seqfile -indexname seqfileindex -tis -suf -lcp -des -ssp ?dna DIR1/gt ltrharvest -index seqfileindex -out seqfile.out99 -outinner seqfile.outinner99 -gff3 seqfile.gff99 -minlenltr 100 \ -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > seqfile.result99 No error, but no results as well Chao Chao ________________________________ 2017.03.15 From: Michael Campbell Date: 2017-03-15 10:27 To: dcg CC: maker-devel Subject: Re: [maker-devel] For help about masking repeats before annotation Hi Chao Chao, I?ve not run into this before. Could you post the RepeatModeler command you used? Thanks, Mike On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: Dear sir: Before the maker opeations, I do repeat masking first on my contigs. However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : NOTE: RepeatScout did not return any models. Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? Hope to hear from you. Best wishes! Chao Chao ________________________________ 2017.03.05 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 16 09:19:02 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 Mar 2017 09:19:02 -0600 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> Message-ID: <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied. Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. So the final result is not a superset, rather a merged subset from each potential source. EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn?t work it produces either no model or partial models. ?Carson > On Mar 16, 2017, at 3:07 AM, Ray Cui wrote: > > Dear Carson, > > thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set? > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt > wrote: > Maybe. I haven?t tested this, but it should work. Maker supports labels for input by placing a ?:? and a label after each file name. > > Example?> > est=file1.fasta:label_1,file2.fasta:label_2 > > If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2. > > As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3?> > evmtrans:est2genome:label1=10 > > I don?t know if the label will force anything raw analysis to rerun, but it shouldn?t. > > > ?Carson > > > >> On Mar 15, 2017, at 5:13 AM, Ray Cui > wrote: >> >> Hi Carson, >> >> currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa? >> >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui > wrote: >> Dear Carson, >> >> thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see. >> >> It is good to know that MAKER will reuse the old results. >> >> Best Regards, >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt > wrote: >> You can find lots of info in the devel archives on training. Example ?> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >> >> Also example of training SNAP on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors >> >> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues. >> >> Thanks, >> Carson >> >> >>> On Mar 14, 2017, at 10:44 AM, Ray Cui > wrote: >>> >>> Hi Carson, >>> Thanks for your prompt response! >>> >>> I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? >>> After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui > wrote: >>> I see. If my evm config looks like this: >>> evmab=5 #default weight for source unspecified ab initio predictions >>> evmab:snap=5 #weight for snap sourced predictions >>> evmab:augustus=10 #weight for augustus sourced predictions >>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>> evmab:genemark=5 #weight for genemark sourced predictions >>> >>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct? >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt > wrote: >>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source. >>> >>> ?Carson >>> >>>> On Mar 14, 2017, at 10:26 AM, Ray Cui > wrote: >>>> >>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now. >>>> >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt > wrote: >>>> >>>> These are set in the maker_evm.ctl file. >>>> >>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this ?> >>>> evmab:GENEMARK=7 >>>> >>>> This also works ?> >>>> evmab:pred_gff:GENEMARK=7 >>>> >>>> Or just set the default ?> >>>> evmab=7 >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui > wrote: >>>>> >>>>> Dear Carson, >>>>> >>>>> I think it may be the most straight foward to input the GFF3 instead. >>>>> >>>>> What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option? >>>>> >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 >>>>> Mobile: +49 0221 37970 496 <> >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt > wrote: >>>>> It may work as is as long as you don?t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff. >>>>> >>>>> ?Carson >>>>> >>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui > wrote: >>>>>> >>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>> >>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>> >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 >>>>>> Mobile: +49 0221 37970 496 <> >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt > wrote: >>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts. >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui > wrote: >>>>>>> >>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>> >>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage. >>>>>>> >>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl ", with the following command lines options: >>>>>>> >>>>>>> Usage: /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] --sequence [filename] >>>>>>> >>>>>>> GeneMark-ES Suite version 4.33 >>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction >>>>>>> >>>>>>> Input sequence/s should be in FASTA format >>>>>>> >>>>>>> Algorithm options >>>>>>> --ES to run self-training >>>>>>> --fungus to run algorithm with branch point model (most useful for fungal genomes) >>>>>>> --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) >>>>>>> --et_score [number]; 4 (default) minimum score of intron in initiation of the ET algorithm >>>>>>> --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome >>>>>>> --training_only to run only training step >>>>>>> --prediction_only to run only prediction step >>>>>>> --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) >>>>>>> >>>>>>> Sequence pre-processing options >>>>>>> --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig >>>>>>> --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training >>>>>>> --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap >>>>>>> Letters 'n' and 'N' are interpreted as standing within gaps >>>>>>> --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask >>>>>>> Letters 'x' and 'X' are interpreted as results of hard masking of repeats >>>>>>> --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length >>>>>>> >>>>>>> Run options >>>>>>> --cores [number]; 1 (default) to run program with multiple threads >>>>>>> --pbs to run on cluster with PBS support >>>>>>> --v verbose >>>>>>> >>>>>>> Customizing parameters: >>>>>>> --max_intron [number]; default 10000 (3000 fungi), maximum length of intron >>>>>>> --max_intergenic [number]; default 10000, maximum length of intergenic regions >>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step >>>>>>> >>>>>>> Developer options: >>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>> --ini_mod [filename]; use this file with parameters for algorithm initiation >>>>>>> --test_set [filename]; to evaluate prediction accuracy on the given test set >>>>>>> --key_bin >>>>>>> --debug >>>>>>> # ------------------- >>>>>>> >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 >>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt > wrote: >>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui > wrote: >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark? >>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Ray >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 >>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt > wrote: >>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui > wrote: >>>>>>>>> >>>>>>>>> Dear Carson, >>>>>>>>> >>>>>>>>> I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the --prediction and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps) >>>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt > wrote: >>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter. >>>>>>>>> >>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don?t know how this compares to GeneMark-ET. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui > wrote: >>>>>>>>>> >>>>>>>>>> Hi Daniel, >>>>>>>>>> >>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? >>>>>>>>>> >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel > wrote: >>>>>>>>>> Hi Ray, >>>>>>>>>> >>>>>>>>>> I think you?re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path to the ?es.mod" file made by Genemark. >>>>>>>>>> >>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. >>>>>>>>>> >>>>>>>>>> ~Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui > wrote: >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor. >>>>>>>>>>> When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>> >>>>>>>>>>> And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcui at age.mpg.de Thu Mar 16 10:02:08 2017 From: rcui at age.mpg.de (Ray Cui) Date: Thu, 16 Mar 2017 17:02:08 +0100 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Message-ID: Dear Carson, thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster. Best Regards, Ray Dr. Rongfeng (Ray) Cui Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing Wissenschaftlicher MA / Postdoctoral researcher Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne Tel.:+49 (0)221 496 Mobile: +49 0221 37970 496 rcui at age.mpg.de www.age.mpg.de On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt wrote: > Final results with source maker will be of type gene/mRNA/exon/CDS. They > have been further processed beyond the raw results, and may include > extensions such as the addition of UTR for example (or hint based > recomputation in the case of SNAP and Augustus). The gene ID of the maker > model will let you know the source before additional processing was > applied. Raw results will also be in the file as type match/match_part and > source evm/snap/augustus, but are only there for reference purposes (there > will also be a raw fasta from each source, but only for reference > purposes). All models compete against each other, and the one best matching > the evidence is kept. So if SNAP or Augustus scores better than EVM, then > that model will be kept for that locus. You can find more detail in the > MAKER wiki and the MAKER2 paper for how models compete. > > So the final result is not a superset, rather a merged subset from each > potential source. > > EVM is not used to obtain a consensus gene model. Its results compete just > like all other algorithms. This is because when EVM works it produces > beautiful models that score really well, but when it doesn?t work it > produces either no model or partial models. > > ?Carson > > > On Mar 16, 2017, at 3:07 AM, Ray Cui wrote: > > Dear Carson, > > thank you so much! I am now peeking into the results for the > finished scaffolds. In the gff file, the gene id confuses me a bit. In this > file, column 2 is always "maker", but the "ID" attribute in the annotation > is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean > the final annotation is a superset of all gene predictors? If EVM was used > to obtain a consensus gene model, why would the other models still show up > in the final result set? > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for > Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 <+49%20221%20496> > Mobile: +49 0221 37970 496 > rcui at age.mpg.de > www.age.mpg.de > > > > On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt wrote: > >> Maybe. I haven?t tested this, but it should work. Maker supports labels >> for input by placing a ?:? and a label after each file name. >> >> Example?> >> est=file1.fasta:label_1,file2.fasta:label_2 >> >> If you label your files, then the label will go into the GFF3. So instead >> of est2genome in column 2, you will get est2genome:label_1 in column 2. >> >> As a result, you should be able to add that label to the EVM settings >> like so and it will match column 2 of the GFF3?> >> evmtrans:est2genome:label1=10 >> >> I don?t know if the label will force anything raw analysis to rerun, but >> it shouldn?t. >> >> >> ?Carson >> >> >> >> On Mar 15, 2017, at 5:13 AM, Ray Cui wrote: >> >> Hi Carson, >> >> currently I am partitioning the protein evidence based on >> phylogenetic relationship into several datasets, supplied as comma >> delimited list. Is it possible then to specify higher weight for >> protein2genome models from closer related species than further related taxa? >> >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >> Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 <+49%20221%20496> >> Mobile: +49 0221 37970 496 >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui wrote: >> >>> Dear Carson, >>> >>> thank you for the pointers! Before running the first round of >>> Maker, I mapped conspecific Trinity assembled proteins (long, "full length" >>> subset) to an earlier version of the genome assembly using my own pipeline >>> and trained Augustus and SNAP that way. I also trained Genemark-ET using >>> TopHat alignments per their instructions. I'm wondering if it will be worth >>> doing a second round, but I guess I will see. >>> >>> It is good to know that MAKER will reuse the old results. >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >>> Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 <+49%20221%20496> >>> Mobile: +49 0221 37970 496 >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt wrote: >>> >>>> You can find lots of info in the devel archives on training. Example ?> >>>> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >>>> >>>> Also example of training SNAP on the wiki ?> >>>> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/M >>>> AKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_init >>>> io_Gene_Predictors >>>> >>>> MAKER will reuse old raw results if you rerun in the same directory >>>> (only deleting what would be different given altered settings between >>>> runs). It will see the existing alignments archived in the datastore as raw >>>> reports and just reuse them. The exception to this are the exonerate >>>> alignments. They are generated relatively quickly compared to the BLAS T >>>> runs, so rerunning them is not too much overhead. Also they are not >>>> archived because doing so created IO issues (exonerate is not running in >>>> bulk batches like BLAST, rather as multiple small separate runs for each >>>> polished read, and archiving a lot of small raw reports can occur so fast >>>> when using MPI that it crashes storage servers). So we decided to just not >>>> archive exonerate rather than develop a database like bundling/compression >>>> mechanism to get around the IO issues. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On Mar 14, 2017, at 10:44 AM, Ray Cui wrote: >>>> >>>> Hi Carson, >>>> Thanks for your prompt response! >>>> >>>> I have a somewhat unrelated question. After the first run of >>>> Maker, I want to train Augustus, SNAP and Genemark-ET using the most >>>> reliable gene models produced in the first round. What would be a good way >>>> to select these gene models? >>>> After retraining the ab initio predictors, I also wonder if >>>> it's necessary to redo all the alignments (blastx, est2genome, >>>> protein2genome etc) in the second iteration, since they are exactly the >>>> same as the first run. Perhaps maker can take in the alignment results from >>>> the previous run? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >>>> Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>> Mobile: +49 0221 37970 496 >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui wrote: >>>> >>>>> I see. If my evm config looks like this: >>>>> evmab=5 #default weight for source unspecified ab initio predictions >>>>> evmab:snap=5 #weight for snap sourced predictions >>>>> evmab:augustus=10 #weight for augustus sourced predictions >>>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>>>> evmab:genemark=5 #weight for genemark sourced predictions >>>>> >>>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value >>>>> from "evmab" (=5) will be used, is that correct? >>>>> >>>>> Best Regards, >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>> for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>> Mobile: +49 0221 37970 496 >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt >>>>> wrote: >>>>> >>>>>> Column 2 in the GFF3 file is the source column. It is used to specify >>>>>> the source fo the data. That column will also be used by EVM to bin >>>>>> features by their source and apply weights based on source. >>>>>> >>>>>> ?Carson >>>>>> >>>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui wrote: >>>>>> >>>>>> Thanks! I didn't know you can also name the gff, but I think using >>>>>> the default is fine, that's what I'm doing now. >>>>>> >>>>>> >>>>>> Best Regards, >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>> for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>> Mobile: +49 0221 37970 496 >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> These are set in the maker_evm.ctl file. >>>>>>> >>>>>>> Use whatever you used in the source column of the input GFF3. For >>>>>>> example if column 2 is set as GENEMARK, then do this ?> >>>>>>> evmab:GENEMARK=7 >>>>>>> >>>>>>> This also works ?> >>>>>>> evmab:pred_gff:GENEMARK=7 >>>>>>> >>>>>>> Or just set the default ?> >>>>>>> evmab=7 >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui wrote: >>>>>>> >>>>>>> Dear Carson, >>>>>>> >>>>>>> I think it may be the most straight foward to input the GFF3 >>>>>>> instead. >>>>>>> >>>>>>> What is the correct way of setting a weight for the EVM step >>>>>>> for this GFF3 models passed through the pred_gff option? >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>>> for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>> Mobile: +49 0221 37970 496 >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt >>>>>>> wrote: >>>>>>> >>>>>>>> It may work as is as long as you don?t need any of the additional >>>>>>>> options that have been added. If not, you can also just run it outside of >>>>>>>> MAKER then provide the result in GFF3 format to pred_gff. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui wrote: >>>>>>>> >>>>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>>>> >>>>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>>>> >>>>>>>> Ray >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>>>> for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET >>>>>>>>> and GeneMark-EP into GeneMark-ES scripts. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui wrote: >>>>>>>>> >>>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>>>> >>>>>>>>> I think there must have been a big update between different >>>>>>>>> Genemark versions. It seems that they now also supports evidence being fed >>>>>>>>> into the prediction stage. >>>>>>>>> >>>>>>>>> The name of the latest version of the genemark script has been >>>>>>>>> changed to "gmes_petap.pl", with the following command lines >>>>>>>>> options: >>>>>>>>> >>>>>>>>> Usage: /beegfs/group_dv/software/sou >>>>>>>>> rce/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] >>>>>>>>> --sequence [filename] >>>>>>>>> >>>>>>>>> GeneMark-ES Suite version 4.33 >>>>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) >>>>>>>>> based training and prediction >>>>>>>>> >>>>>>>>> Input sequence/s should be in FASTA format >>>>>>>>> >>>>>>>>> Algorithm options >>>>>>>>> --ES to run self-training >>>>>>>>> --fungus to run algorithm with branch point model (most >>>>>>>>> useful for fungal genomes) >>>>>>>>> --ET [filename]; to run training with introns >>>>>>>>> coordinates from RNA-Seq read alignments (GFF format) >>>>>>>>> --et_score [number]; 4 (default) minimum score of intron in >>>>>>>>> initiation of the ET algorithm >>>>>>>>> --evidence [filename]; to use in prediction external >>>>>>>>> evidence (RNA or protein) mapped to genome >>>>>>>>> --training_only to run only training step >>>>>>>>> --prediction_only to run only prediction step >>>>>>>>> --predict_with [filename]; predict genes using this file species >>>>>>>>> specific parameters (bypass regular training and prediction steps) >>>>>>>>> >>>>>>>>> Sequence pre-processing options >>>>>>>>> --max_contig [number]; 5000000 (default) will split input >>>>>>>>> genomic sequence into contigs shorter then max_contig >>>>>>>>> --min_contig [number]; 50000 (default); will ignore contigs >>>>>>>>> shorter then min_contig in training >>>>>>>>> --max_gap [number]; 5000 (default); will split sequence at >>>>>>>>> gaps longer than max_gap >>>>>>>>> Letters 'n' and 'N' are interpreted as standing >>>>>>>>> within gaps >>>>>>>>> --max_mask [number]; 5000 (default); will split sequence at >>>>>>>>> repeats longer then max_mask >>>>>>>>> Letters 'x' and 'X' are interpreted as results of >>>>>>>>> hard masking of repeats >>>>>>>>> --soft_mask [number] to indicate that lowercase letters stand >>>>>>>>> for repeats; utilize only lowercase repeats longer than specified length >>>>>>>>> >>>>>>>>> Run options >>>>>>>>> --cores [number]; 1 (default) to run program with >>>>>>>>> multiple threads >>>>>>>>> --pbs to run on cluster with PBS support >>>>>>>>> --v verbose >>>>>>>>> >>>>>>>>> Customizing parameters: >>>>>>>>> --max_intron [number]; default 10000 (3000 fungi), >>>>>>>>> maximum length of intron >>>>>>>>> --max_intergenic [number]; default 10000, maximum length of >>>>>>>>> intergenic regions >>>>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum >>>>>>>>> allowed gene length in prediction step >>>>>>>>> >>>>>>>>> Developer options: >>>>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>>>> --ini_mod [filename]; use this file with parameters for >>>>>>>>> algorithm initiation >>>>>>>>> --test_set [filename]; to evaluate prediction accuracy on >>>>>>>>> the given test set >>>>>>>>> --key_bin >>>>>>>>> --debug >>>>>>>>> # ------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>> Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Also note that the gmhmme3 executable distributed with different >>>>>>>>>> flavors of genemark has had the same name but has been quite different in >>>>>>>>>> both command line structure and output between flavors. >>>>>>>>>> >>>>>>>>>> ?Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui wrote: >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters >>>>>>>>>> automatically set by Maker when calling Genemark? >>>>>>>>>> If you can point me to the part of the maker source code that >>>>>>>>>> construct the final genemark command line I can also take a look. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt >>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. >>>>>>>>>>> It depends on if formatting or any flags have changed between versions. >>>>>>>>>>> >>>>>>>>>>> ?Carson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui wrote: >>>>>>>>>>> >>>>>>>>>>> Dear Carson, >>>>>>>>>>> >>>>>>>>>>> I have now run GeneMark-ET, and it produces a trained >>>>>>>>>>> .mod file. I think it can be then passed to Maker. Do you know what is the >>>>>>>>>>> final constructed command line in Maker that calls genemark? Genemark-et >>>>>>>>>>> and es use the same perl script so one probably only needs to use the >>>>>>>>>>> --prediction and --predict_with xxx.mod options to predict genes using >>>>>>>>>>> the species specific parameters (bypassing regular training and prediction >>>>>>>>>>> steps) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may >>>>>>>>>>>> not work with GeneMark-ET. So any MAKER related archive posts etc. will be >>>>>>>>>>>> related to the latter. >>>>>>>>>>>> >>>>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let >>>>>>>>>>>> it run. It would then produce several files and output directories. The >>>>>>>>>>>> es.mod file was the one you provided to MAKER. I don?t know how this >>>>>>>>>>>> compares to GeneMark-ET. >>>>>>>>>>>> >>>>>>>>>>>> ?Carson >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Daniel, >>>>>>>>>>>> >>>>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" >>>>>>>>>>>> flag, is that the flag I should use when training or should I just let >>>>>>>>>>>> Genemark also perform the prediction? >>>>>>>>>>>> >>>>>>>>>>>> Ray >>>>>>>>>>>> >>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ray, >>>>>>>>>>>>> >>>>>>>>>>>>> I think you?re on the right track with training Genemark with >>>>>>>>>>>>> RNAseq data. It should only change the training steps, which are external >>>>>>>>>>>>> to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path >>>>>>>>>>>>> to the ?es.mod" file made by Genemark. >>>>>>>>>>>>> >>>>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a >>>>>>>>>>>>> control file for EVM, in which you set your weights for the various inputs, >>>>>>>>>>>>> and then MAKER runs EVM alongside all the other gene predictors and chooses >>>>>>>>>>>>> the model that is best supported by the evidence. >>>>>>>>>>>>> >>>>>>>>>>>>> ~Daniel >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> I have sucessfully installed Maker beta 3, working >>>>>>>>>>>>> with both Augustus and SNAP. I also want to try adding GeneMark-ES to the >>>>>>>>>>>>> ab initio predictor. >>>>>>>>>>>>> When I read the GeneMark-ES manual, it says that one >>>>>>>>>>>>> can use RNAseq data to aid training. I'm wondering what would be the best >>>>>>>>>>>>> way to integrate Genemark-ET predictions into Maker. Should I run >>>>>>>>>>>>> Genemark-ET independent of Maker, then integrate the GFF at some point >>>>>>>>>>>>> during the maker process? If so, how should I edit the configuration file? >>>>>>>>>>>>> Currently maker has an option called "gmhmm". Should I then train GeneMark >>>>>>>>>>>>> by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>>>> >>>>>>>>>>>>> And perhaps an unrelated question is that now Maker >>>>>>>>>>>>> beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, >>>>>>>>>>>>> what does it do), and how does it differ from what Maker is designed for >>>>>>>>>>>>> (both reconciles different gene models). >>>>>>>>>>>>> >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Ray >>>>>>>>>>>>> >>>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>>>>>>>>>>>> ell-lab.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>>>>>>>>>>> ell-lab.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 16 11:30:16 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 Mar 2017 11:30:16 -0600 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Message-ID: 1. Verify that the issue is not being caused by hints from evidence (i.e. that you aren?t feeding fused mRNA-seq assemblies or protein evidence). Fused evidence will result in hints that fuse models. 2. If it still have an issue, then drop SNAP. Not all predictors work well on all genomes. Also no one can post to the google group. It?s just for archival. All message have to go to the mailing list here, and they then get archived on google ?> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org The mailing list logs shows that you requested to unsubscribed earlier today. ?Carson > On Mar 16, 2017, at 11:22 AM, Ray Cui wrote: > > Hi Carson, > > due to some reason I can't seem to post anymore on the google group. > > After looking at the results, it appears that SNAP performs poorly compared to genemark-ET and augustus. It looks like it's very prone to fusing neighboring genes and getting false positives. Is that a general thing you see in vertebrate genomes with SNAP? I saw that you didn't recommend SNAP for primates, perhaps the issue is similar? > > Attached you can see a screen shot of IGV browser, with all evidence tracks separated. > > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Thu, Mar 16, 2017 at 5:02 PM, Ray Cui > wrote: > Dear Carson, > > thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster. > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt > wrote: > Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied. Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. > > So the final result is not a superset, rather a merged subset from each potential source. > > EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn?t work it produces either no model or partial models. > > ?Carson > > >> On Mar 16, 2017, at 3:07 AM, Ray Cui > wrote: >> >> Dear Carson, >> >> thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set? >> >> Best Regards, >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt > wrote: >> Maybe. I haven?t tested this, but it should work. Maker supports labels for input by placing a ?:? and a label after each file name. >> >> Example?> >> est=file1.fasta:label_1,file2.fasta:label_2 >> >> If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2. >> >> As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3?> >> evmtrans:est2genome:label1=10 >> >> I don?t know if the label will force anything raw analysis to rerun, but it shouldn?t. >> >> >> ?Carson >> >> >> >>> On Mar 15, 2017, at 5:13 AM, Ray Cui > wrote: >>> >>> Hi Carson, >>> >>> currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa? >>> >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui > wrote: >>> Dear Carson, >>> >>> thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see. >>> >>> It is good to know that MAKER will reuse the old results. >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt > wrote: >>> You can find lots of info in the devel archives on training. Example ?> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >>> >>> Also example of training SNAP on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors >>> >>> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues. >>> >>> Thanks, >>> Carson >>> >>> >>>> On Mar 14, 2017, at 10:44 AM, Ray Cui > wrote: >>>> >>>> Hi Carson, >>>> Thanks for your prompt response! >>>> >>>> I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? >>>> After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui > wrote: >>>> I see. If my evm config looks like this: >>>> evmab=5 #default weight for source unspecified ab initio predictions >>>> evmab:snap=5 #weight for snap sourced predictions >>>> evmab:augustus=10 #weight for augustus sourced predictions >>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>>> evmab:genemark=5 #weight for genemark sourced predictions >>>> >>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt > wrote: >>>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source. >>>> >>>> ?Carson >>>> >>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui > wrote: >>>>> >>>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now. >>>>> >>>>> >>>>> Best Regards, >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 >>>>> Mobile: +49 0221 37970 496 <> >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt > wrote: >>>>> >>>>> These are set in the maker_evm.ctl file. >>>>> >>>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this ?> >>>>> evmab:GENEMARK=7 >>>>> >>>>> This also works ?> >>>>> evmab:pred_gff:GENEMARK=7 >>>>> >>>>> Or just set the default ?> >>>>> evmab=7 >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui > wrote: >>>>>> >>>>>> Dear Carson, >>>>>> >>>>>> I think it may be the most straight foward to input the GFF3 instead. >>>>>> >>>>>> What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option? >>>>>> >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 >>>>>> Mobile: +49 0221 37970 496 <> >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt > wrote: >>>>>> It may work as is as long as you don?t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff. >>>>>> >>>>>> ?Carson >>>>>> >>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui > wrote: >>>>>>> >>>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>>> >>>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 >>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt > wrote: >>>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui > wrote: >>>>>>>> >>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>>> >>>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage. >>>>>>>> >>>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl ", with the following command lines options: >>>>>>>> >>>>>>>> Usage: /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] --sequence [filename] >>>>>>>> >>>>>>>> GeneMark-ES Suite version 4.33 >>>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction >>>>>>>> >>>>>>>> Input sequence/s should be in FASTA format >>>>>>>> >>>>>>>> Algorithm options >>>>>>>> --ES to run self-training >>>>>>>> --fungus to run algorithm with branch point model (most useful for fungal genomes) >>>>>>>> --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) >>>>>>>> --et_score [number]; 4 (default) minimum score of intron in initiation of the ET algorithm >>>>>>>> --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome >>>>>>>> --training_only to run only training step >>>>>>>> --prediction_only to run only prediction step >>>>>>>> --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) >>>>>>>> >>>>>>>> Sequence pre-processing options >>>>>>>> --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig >>>>>>>> --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training >>>>>>>> --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap >>>>>>>> Letters 'n' and 'N' are interpreted as standing within gaps >>>>>>>> --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask >>>>>>>> Letters 'x' and 'X' are interpreted as results of hard masking of repeats >>>>>>>> --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length >>>>>>>> >>>>>>>> Run options >>>>>>>> --cores [number]; 1 (default) to run program with multiple threads >>>>>>>> --pbs to run on cluster with PBS support >>>>>>>> --v verbose >>>>>>>> >>>>>>>> Customizing parameters: >>>>>>>> --max_intron [number]; default 10000 (3000 fungi), maximum length of intron >>>>>>>> --max_intergenic [number]; default 10000, maximum length of intergenic regions >>>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step >>>>>>>> >>>>>>>> Developer options: >>>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>>> --ini_mod [filename]; use this file with parameters for algorithm initiation >>>>>>>> --test_set [filename]; to evaluate prediction accuracy on the given test set >>>>>>>> --key_bin >>>>>>>> --debug >>>>>>>> # ------------------- >>>>>>>> >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 >>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt > wrote: >>>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui > wrote: >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark? >>>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt > wrote: >>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui > wrote: >>>>>>>>>> >>>>>>>>>> Dear Carson, >>>>>>>>>> >>>>>>>>>> I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the --prediction and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt > wrote: >>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter. >>>>>>>>>> >>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don?t know how this compares to GeneMark-ET. >>>>>>>>>> >>>>>>>>>> ?Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Daniel, >>>>>>>>>>> >>>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? >>>>>>>>>>> >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel > wrote: >>>>>>>>>>> Hi Ray, >>>>>>>>>>> >>>>>>>>>>> I think you?re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path to the ?es.mod" file made by Genemark. >>>>>>>>>>> >>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. >>>>>>>>>>> >>>>>>>>>>> ~Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor. >>>>>>>>>>>> When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>>> >>>>>>>>>>>> And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Ray >>>>>>>>>>>> >>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Mar 16 21:48:10 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Mar 2017 23:48:10 -0400 Subject: [maker-devel] split genes Message-ID: Hello: If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 17 09:21:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Mar 2017 09:21:10 -0600 Subject: [maker-devel] split genes In-Reply-To: References: Message-ID: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. ?Carson > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang wrote: > > Hello: > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Fri Mar 17 11:49:06 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 17 Mar 2017 13:49:06 -0400 Subject: [maker-devel] split genes In-Reply-To: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. Best Quanwei 2017-03-17 11:21 GMT-04:00 Carson Holt : > MAKER will not try and predict a gene across contigs because it it too > difficult to determine contig order. If you are able to determine order, > then it is best to merge the contigs into a single scaffold before > annotating rather than try and produce split models in GFF3. > > ?Carson > > > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > If one gene was covered by two contigs, sometimes we may predicted two > genes. I wonder how Maker deal with such conditions? > > Even Maker tried to reduce such cases, they can not be completely avoid. > So I wonder whether there is any way or any tool to find such split genes > (one gene split into two contigs and predicted as two genes)? > > > > As we know, we can also provide protein sequences and transcript > assembly as evidences. Can a protein sequence or transcript assembly rescue > the split genes in Maker pipe line? For example, if one transcript cover > 40% of predicted genes predicted in two contigs, then merge the predicted > genes into one? > > > > Thanks > > > > Best > > Quanwei > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Fri Mar 17 16:37:16 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 17 Mar 2017 18:37:16 -0400 Subject: [maker-devel] putative gene function by mapping to UniProt/Swiss-prot set Message-ID: Hello: I have a questions about the assigning putative gene function by mapping to UniProt/Swiss-prot gene set (described in the protocol published in 2014). Here, for each of the gene model from Maker, the pipeline will find the most similar protein in UniProt/Swiss-prot and assign the function of the matched protein, right? It does not require best-reciprocal hit, right? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Mar 20 07:03:10 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 20 Mar 2017 09:03:10 -0400 Subject: [maker-devel] putative gene function by mapping to UniProt/Swiss-prot set In-Reply-To: References: Message-ID: Hi Quanwei, Correct. Just the best hit when blasting the MAKER generated fasta sequences to Swiss-prot. Thanks, Mike > On Mar 17, 2017, at 6:37 PM, Quanwei Zhang wrote: > > Hello: > > I have a questions about the assigning putative gene function by mapping to UniProt/Swiss-prot gene set (described in the protocol published in 2014). > Here, for each of the gene model from Maker, the pipeline will find the most similar protein in UniProt/Swiss-prot and assign the function of the matched protein, right? > It does not require best-reciprocal hit, right? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Mon Mar 20 11:09:28 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Mon, 20 Mar 2017 13:09:28 -0400 Subject: [maker-devel] evidence of transcript assembly Message-ID: Hello: I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? (1) All the Trinity transcript assemblies? (2) Trinity transcript assemblies that fully cover the mouse genes? (3) Trinity transcript assemblies either fully or partly cover the mouse genes? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Mon Mar 20 11:09:28 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Mon, 20 Mar 2017 13:09:28 -0400 Subject: [maker-devel] evidence of transcript assembly Message-ID: Hello: I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? (1) All the Trinity transcript assemblies? (2) Trinity transcript assemblies that fully cover the mouse genes? (3) Trinity transcript assemblies either fully or partly cover the mouse genes? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenna.kramer at utoronto.ca Mon Mar 20 19:37:45 2017 From: glenna.kramer at utoronto.ca (Glenna Kramer) Date: Tue, 21 Mar 2017 01:37:45 +0000 Subject: [maker-devel] GFF no longer valid after renaming genes Message-ID: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Hi there, I am hoping that you can give me some assistance with finishing up my maker annotated genome for submission. I have been able to rename the genes for GenBank submission - using Support Protocol 2 in the paper by Campbell et. al "Genome Annotation and Curation Using MAKER and MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39. (PMC4286374). I have also been able to use the Support Protocol 3 from that same paper to assign a putative gene function. However, I am running into problems when I am trying to convert the GFF file to the tbl format for submission. I have tried to use scripts from GAG (Genome Annotation Generator) and maker (gff32table). Both of these scripts work wonderfully on the gff originally output from maker, but do not work once I rename the genes for GenBank submission. When I feed my file into a gff validator it turns out that my gff is valid prior to renaming, but after I rename the gff is no longer valid. I have been trying to troubleshoot what is happening to my gff when I rename as in Support Protocol 2, but am stumped. Has anyone else out there had a similar issue? I would be very thankful for any insight that you can provide! Best, Glenna Not sure if this will be helpful, but here is an example gene from prior to renaming: ##gff-version 3 ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 And after renaming: ##gff-version 3 ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; The commands I used were: % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map %map_gff_ids myfilename.map myfilename.gff -------------- next part -------------- An HTML attachment was scrubbed... URL: From adf at ncgr.org Mon Mar 20 19:49:22 2017 From: adf at ncgr.org (Andrew Farmer) Date: Mon, 20 Mar 2017 19:49:22 -0600 Subject: [maker-devel] GFF no longer valid after renaming genes In-Reply-To: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> References: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Message-ID: <127be156-b2bd-574f-5187-9942f05220e2@ncgr.org> Hi Glenna- this may be totally off-base but I have a vague memory that some validators will complain about the semicolon after the last attribute in the column nine attribute list; it's not clear to me from the specification that this is truly illegal, but can imagine why a parser might not like to deal with it. In any case, you might try just removing that terminal semicolon character and see if that solves the validation complaint. but apologies in advance if my dim recollection has misled me into wasting your time... Andrew Farmer On 3/20/17 7:37 PM, Glenna Kramer wrote: > Hi there, > > I am hoping that you can give me some assistance with finishing up my > maker annotated genome for submission. I have been able to rename the > genes for GenBank submission - using Support Protocol 2 in the paper > by Campbell et. al "Genome Annotation and Curation Using MAKER and > MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39. > (PMC4286374). > I have also been able to use the Support Protocol 3 from that same > paper to assign a putative gene function. However, I am running into > problems when I am trying to convert the GFF file to the tbl format > for submission. I have tried to use scripts from GAG (Genome > Annotation Generator) and maker (gff32table). Both of these scripts > work wonderfully on the gff originally output from maker, but do not > work once I rename the genes for GenBank submission. When I feed my > file into a gff validator it turns out that my gff is valid prior to > renaming, but after I rename the gff is no longer valid. I have been > trying to troubleshoot what is happening to my gff when I rename as in > Support Protocol 2, but am stumped. Has anyone else out there had a > similar issue? I would be very thankful for any insight that you can > provide! > > Best, > Glenna > > Not sure if this will be helpful, but here is an example gene from > prior to renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 > ChromoV|quiver|quiver maker exon 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > > And after renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . > ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . > ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; > ChromoV|quiver|quiver maker exon 62081 62650 . + . > ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 > ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > > The commands I used were: > > % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map > > %map_gff_ids myfilename.map myfilename.gff > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- ...all concepts in which an entire process is semiotically concentrated elude definition; only that which has no history is definable. Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 10:15:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 10:15:20 -0600 Subject: [maker-devel] GFF no longer valid after renaming genes In-Reply-To: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> References: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Message-ID: <5DFD02E2-2C6F-49DA-90DE-9E17EE0A8CE2@gmail.com> The problem appears to be the multiple ?|? characters in your contig names (ChromoV|quiver|quiver). They end up in the gene ID, and since ?|? has a special meaning in perl, it creates weird replacement behavior. I?ve attached two scripts that will fix that. Use them to replace their counterparts in the ?/maker/bin/ and .../maker/src/bin/ directories, then rerun all renaming steps on a new gff3 (not the one you already tried to rename). Also you may want to consider changing IDs in the assembly itself before you release it or use it for analysis. You would want to remove the '|quiver|quiver? tail on every contig. That tail has the potential to open up hidden downstream analysis errors from other tools for the same reasons outlined above, since ?|? characters have special meaning. Thanks, Carson > On Mar 20, 2017, at 7:37 PM, Glenna Kramer wrote: > > Hi there, > > I am hoping that you can give me some assistance with finishing up my maker annotated genome for submission. I have been able to rename the genes for GenBank submission - using Support Protocol 2 in the paper by Campbell et. al "Genome Annotation and Curation Using MAKER and MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39.? (PMC4286374). I have also been able to use the Support Protocol 3 from that same paper to assign a putative gene function. However, I am running into problems when I am trying to convert the GFF file to the tbl format for submission. I have tried to use scripts from GAG (Genome Annotation Generator) and maker (gff32table). Both of these scripts work wonderfully on the gff originally output from maker, but do not work once I rename the genes for GenBank submission. When I feed my file into a gff validator it turns out that my gff is valid prior to renaming, but after I rename the gff is no longer valid. I have been trying to troubleshoot what is happening to my gff when I rename as in Support Protocol 2, but am stumped. Has anyone else out there had a similar issue? I would be very thankful for any insight that you can provide! > > Best, > Glenna > > Not sure if this will be helpful, but here is an example gene from prior to renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 > ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > > And after renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; > ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > > The commands I used were: > > % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map > > %map_gff_ids myfilename.map myfilename.gff > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map_fasta_ids Type: application/octet-stream Size: 1676 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map_gff_ids Type: application/octet-stream Size: 5048 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 11:00:06 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 11:00:06 -0600 Subject: [maker-devel] split genes In-Reply-To: References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: I have no suggestions, but maybe someone else on the list may have some. ?Carson > On Mar 17, 2017, at 11:49 AM, Quanwei Zhang wrote: > > Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. > > Best > Quanwei > > 2017-03-17 11:21 GMT-04:00 Carson Holt >: > MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. > > ?Carson > > > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > > > Thanks > > > > Best > > Quanwei > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 11:01:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 11:01:30 -0600 Subject: [maker-devel] evidence of transcript assembly In-Reply-To: References: Message-ID: <297B9C95-919E-4D4F-9103-1FED1550B745@gmail.com> Different sources of data will have different levels of quality. You may want to run them all, then look at results in a browser like Apollo. If specific source look like they are more problematic than others, then drop them. ?Carson > On Mar 20, 2017, at 11:09 AM, Quanwei Zhang wrote: > > Hello: > > I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? > (1) All the Trinity transcript assemblies? > (2) Trinity transcript assemblies that fully cover the mouse genes? > (3) Trinity transcript assemblies either fully or partly cover the mouse genes? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Mar 21 11:47:21 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 21 Mar 2017 17:47:21 +0000 Subject: [maker-devel] split genes In-Reply-To: References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: Just curious but have you tried scaffolding your assembly using your RNA-Seq de novo assembly data? We?ve seen some improvement with BUSCO calls and annotation after doing this using L_RNA_Scaffolder (though you do need to be a bit careful and try reducing your trx assembly down to a somewhat non-redundant set). chris From: maker-devel on behalf of Carson Holt Date: Tuesday, March 21, 2017 at 12:00 PM To: Quanwei Zhang Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] split genes I have no suggestions, but maybe someone else on the list may have some. ?Carson On Mar 17, 2017, at 11:49 AM, Quanwei Zhang > wrote: Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. Best Quanwei 2017-03-17 11:21 GMT-04:00 Carson Holt >: MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. ?Carson > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > Hello: > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Fri Mar 24 03:10:45 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Fri, 24 Mar 2017 10:10:45 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: HI! First of all thank your for previous help. Running Maker 2.31.9 with MPI (Intel) is running fine, if we use ONE node only. But, if we try to concatenate more than one node (e.g. 2 node a? 8 cores) we get this error: [...] ### Running Maker example MOAB_PROCCOUNT: 16 slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead STATUS: Parsing control files... Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. [...] /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 was created before and is EXISTING during the period of the job continuance. I attached the complete log to this e-mail. Again: THANK YOU VERY MUCH. All the best. -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- #!/bin/bash #MSUB -N maker-job #MSUB -j oe #MSUB -o $(JOBNAME).$(JOBID) #MSUB -m ae # -M given_name.family_name at your-uni.de #MSUB -l nodes=2:ppn=8 #MSUB -l mem=20gb #MSUB -l walltime=01:00:00 # start=$(date +%s) echo " " echo "### Setting up shell environment ..." echo " " # if test -e "/etc/profile"; then source "/etc/profile"; fi; if test -e "$HOME/.bash_profile"; then source "$HOME/.bash_profile"; fi; unset LANG; export LC_ALL="C"; export MKL_NUM_THREADS=1; export OMP_NUM_THREADS=1 export USER=${USER:=`logname`} export MOAB_JOBID=${MOAB_JOBID:=`date +%s`} export MOAB_SUBMITDIR=${MOAB_SUBMITDIR:=`pwd`} export MOAB_JOBNAME=${MOAB_JOBNAME:=`basename "$0"`} export MOAB_JOBNAME=$(echo "${MOAB_JOBNAME}" | sed 's/[^a-zA-Z0-9._-]/_/g') export MOAB_NODECOUNT=${MOAB_NODECOUNT:=1} export MOAB_PROCCOUNT=${MOAB_PROCCOUNT:=1} ulimit -s 200000 echo " " echo "### Printing basic job infos to stdout ..." echo " " echo "START_TIME = `date +'%y-%m-%d %H:%M:%S %s'`" echo "HOSTNAME = ${HOSTNAME}" echo "USER = ${USER}" echo "MOAB_JOBNAME = ${MOAB_JOBNAME}" echo "MOAB_JOBID = ${MOAB_JOBID}" echo "MOAB_SUBMITDIR = ${MOAB_SUBMITDIR}" echo "MOAB_NODECOUNT = ${MOAB_NODECOUNT}" echo "MOAB_PROCCOUNT = ${MOAB_PROCCOUNT}" echo "SLURM_NODELIST = ${SLURM_NODELIST}" echo "PBS_NODEFILE = ${PBS_NODEFILE}" if test -f "${PBS_NODEFILE}"; then echo "PBS_NODEFILE (begin) ---------------------------------" NO_NODES=$(wc -l < ${PBS_NODEFILE}) cat "${PBS_NODEFILE}" echo "PBS_NODEFILE (end) -----------------------------------" else NO_NODES=1 fi # ############################################################################## echo " " echo "### Creating TMP_WORK_DIR directory and changing to it ..." echo " " # Using "/tmp/$USER" should be ok for one node jobs. In case of multi-node jobs # it might be neccessary to modify TMP_BASE_DIR to point to SLURM_SUBMIT_DIR # or to create (and delete) TMP_WORK_DIR on each node (job-type dependent). # NEVER EVER calculate in your home directory. JOB_WORK_DIR="${SLURM_JOB_NAME}.uc1.${SLURM_JOB_ID%%.*}.$(date +%y%m%d_%H%M%S)" if test -z "$SLURM_NNODES" -o "$SLURM_NNODES" = "1" then TMP_BASE_DIR="/tmp/${USER}" else # in case of 2 or more nodes, use a common scratch dir available on all nodes... TMP_BASE_DIR="$SLURM_SUBMIT_DIR" fi TMP_WORK_DIR="${TMP_BASE_DIR}/${JOB_WORK_DIR}" echo "JOB_WORK_DIR = ${JOB_WORK_DIR}" echo "TMP_BASE_DIR = ${TMP_BASE_DIR}" echo "TMP_WORK_DIR cd = ${TMP_WORK_DIR}" mkdir -vp "${TMP_WORK_DIR}" && { cd "${TMP_WORK_DIR}"; pwd; } || { echo "ERROR: cd $TMP_WORK_DIR"; exit 1; } # Remarks: # * The job's temporary subdirectory JOB_WORK_DIR consists of SLURM_JOB_NAME # and SLURM_JOB_ID connected by ".uc1.". This is a little bit of magic since # the output file of your job follows the same rule. Therefore the # sorting of files belonging to one job will work nicely, when you # list the result files later in the submit directory (SLURM_SUBMIT_DIR). # * Using TMP_BASE_DIR="/tmp/$USER" is ok, if the job requires less # than 3.6 TB of node local disk space (for details see "www.bwhpc-c5.de"). # ############################################################################## echo " " echo "### Loading MAKER module:" echo " " module load bio/maker/2.31.9 [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } echo "MAKER_VERSION = $MAKER_VERSION" module list echo " " echo "### Copying input examples files for job:" echo " " cp -v ${MAKER_EXA_DIR}/*.{fasta,ctl} . sleep 2 echo " " echo "### Display internal Maker/bwHPC environments..." echo " " echo "MAKER_BIN_DIR = ${MAKER_BIN_DIR}" echo "MAKER_EXA_DIR = ${MAKER_EXA_DIR}" echo "" echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 # # Do NOT use mpiexec here. Unfortunately this crashes # "STATUS: Processing and indexing input FASTA files..." # exec.hydra -n 2 maker -h echo "MOAB_PROCCOUNT: ${MOAB_PROCCOUNT:=1}" # do NOT use mpiexec. use mpiexec.hydra or mpirun. # mpirun -n ${MOAB_PROCCOUNT} maker -h # mpirun -n ${MOAB_PROCCOUNT} maker 2>&1 >maker_$(date +%Y-%m-%d_%H:%M:%S).out mpirun -n ${MOAB_PROCCOUNT} maker echo "### Cleaning up files ... removing unnecessary scratch files ..." echo " " # rm -fv sleep 3 # Sleep some time so potential stale nfs handles can disappear. echo " " echo "### Compressing results and copying back result archive ..." echo " " cd "${TMP_BASE_DIR}" mkdir -vp "${MOAB_SUBMITDIR}" # if user has deleted or moved the submit dir echo "Creating result tgz-file '${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz' ..." tar -zcvf "${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz" "${JOB_WORK_DIR}" \ || { echo "ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR '$TMP_WORK_DIR' on host '$HOSTNAME' manually (if not done automatically by queueing system)."; exit 102; } # Remarks: # * The resulting tgz file is copied back to the submit directory. # The name of the tgz file looks similar too # "bwunicluster-maker-example.moab.275.110528_101755.tgz" echo " " echo "### Final cleanup: Remove TMP_WORK_DIR ..." echo " " rm -rvf "${TMP_WORK_DIR}" echo "END_TIME = `date +'%y-%m-%d %H:%M:%S %s'`" end=$(date +%s) echo " " echo "### Calculate duration ..." echo " " diff=$[end-start] if [ $diff -lt 60 ]; then echo "Runtime (approx.): '$diff' secs" elif [ $diff -ge 60 ]; then echo 'Runtime (approx.): '$[$diff / 60] 'min(s) '$[$diff % 60] 'secs' fi -------------- next part -------------- ### Setting up shell environment ... ### Printing basic job infos to stdout ... START_TIME = 17-03-24 04:35:21 1490326521 HOSTNAME = uc1n385 USER = kn_pop235844 MOAB_JOBNAME = maker-job MOAB_JOBID = 11658541 MOAB_SUBMITDIR = /pfs/work2/workspace/scratch/kn_pop235844-wstest-0 MOAB_NODECOUNT = 2 MOAB_PROCCOUNT = 16 SLURM_NODELIST = uc1n[385,397] PBS_NODEFILE = ### Creating TMP_WORK_DIR directory and changing to it ... JOB_WORK_DIR = maker-job.uc1.11658541.170324_043521 TMP_BASE_DIR = /tmp/kn_pop235844 TMP_WORK_DIR cd = /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521 mkdir: created directory '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521' /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521 ### Loading MAKER module: MAKER_VERSION = 2.31.9 Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) 3) bio/maker/2.31.9 ### Copying input examples files for job: '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_contig.fasta' -> './dpp_contig.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_est.fasta' -> './dpp_est.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_protein.fasta' -> './dpp_protein.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_bopts.ctl' -> './maker_bopts.ctl' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_exe.ctl' -> './maker_exe.ctl' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_opts.ctl' -> './maker_opts.ctl' ### Display internal Maker/bwHPC environments... MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.9/bin MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples ### Runing Maker example MOAB_PROCCOUNT: 16 slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521': No such file or directory: going to /tmp instead STATUS: Parsing control files... Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. ### Cleaning up files ... removing unnecessary scratch files ... ### Compressing results and copying back result archive ... Creating result tgz-file '/pfs/work2/workspace/scratch/kn_pop235844-wstest-0/maker-job.uc1.11658541.170324_043521.tgz' ... maker-job.uc1.11658541.170324_043521/ maker-job.uc1.11658541.170324_043521/dpp_contig.fasta maker-job.uc1.11658541.170324_043521/dpp_est.fasta maker-job.uc1.11658541.170324_043521/dpp_protein.fasta maker-job.uc1.11658541.170324_043521/maker_bopts.ctl maker-job.uc1.11658541.170324_043521/maker_exe.ctl maker-job.uc1.11658541.170324_043521/maker_opts.ctl maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/ maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log ### Final cleanup: Remove TMP_WORK_DIR ... removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_est.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_protein.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_bopts.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_exe.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_opts.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log' removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output' removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521' END_TIME = 17-03-24 04:36:08 1490326568 ### Calculate duration ... Runtime (approx.): '47' secs -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Fri Mar 24 09:00:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 Mar 2017 09:00:58 -0600 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: <2D6022EE-3AFC-4B87-99A3-2D310995A844@gmail.com> This error ?> slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead It is from SLURM and not from MAKER. It occurs before your job even started. It?s from the SLURM initialization of one of the nodes you are using. Note /tmp is not shared. It is independent on each node. So /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 may exist on one node, but not on the others. Since you are somehow setting this before you launch the job, SLURM is complaining because it doesn?t exist on one of the other nodes during initialization. So you need to review how you are launching things. ?Carson > On Mar 24, 2017, at 3:10 AM, Rainer Rutka wrote: > > HI! > First of all thank your for previous help. > Running Maker 2.31.9 with MPI (Intel) is running fine, if we > use ONE node only. > > But, if we try to concatenate more than one node (e.g. 2 node a? 8 > cores) we get this error: > > [...] > ### Running Maker example > > MOAB_PROCCOUNT: 16 > slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead > STATUS: Parsing control files... > Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. > [...] > > /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 > was created before and is EXISTING during the period of the > job continuance. > > I attached the complete log to this e-mail. > > Again: THANK YOU VERY MUCH. > > All the best. > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From carson.holt at genetics.utah.edu Wed Mar 29 12:12:35 2017 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 29 Mar 2017 18:12:35 +0000 Subject: [maker-devel] non-M gene models In-Reply-To: <59ca4391-d32e-bfa8-4118-8c9586f3dfe4@email.arizona.edu> References: <717138b6-fc7f-8f23-e550-c3019c4f96ec@email.arizona.edu> <59ca4391-d32e-bfa8-4118-8c9586f3dfe4@email.arizona.edu> Message-ID: <0AD41A2D-9CFE-48DE-B338-F15D3A590B30@genetics.utah.edu> Maybe. Those two options can result in a lot of partial models. Also setting always_complete=1 will help some. Models without M at the start are generally partial models. There is often something about the contig that keeps it from being a whole model (single basepair error breaks ORF or splice site, or a string of NNN?s overlap part of an exon). You can also try identifying InterPro domain and dropping any model without a defined domain (i.e. if it?s going to be partial, at least make sure it?s useful in its partial form). ?Carson On Mar 29, 2017, at 4:23 AM, Dario Copetti > wrote: Looking at the config file again I notice this: est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no I usually turn them on only to get models from ESTs to train Augustus and SNAP: do you think that having these parameters on during the final annotation will produce the non-M models? If so, do you think that re-running MAKER again with them turned off and using the MAKER-derived gff3 will clean out these models? Can you elaborate a bit more on the usage of these two parameters? Thanks, Dario On 3/29/2017 12:07 PM, Dario Copetti wrote: Hi Carson, We are ready to submit several different sets of annotations but we are now stuck with the issue of having models which protein sequence does not start with Met, and NCBI is picky about that. Below I paste an example of a genome we are working on: as you see, most (95%) of the models start with M, but a significant fraction (almost 1500 models!) does not. We used MAKER 2.31.8, specifying the option of having models that only start with M. We realize that this issue may not be easy to fix - and also that there are indeed isoforms that do not start with M - but how would you fix this? Within or outside MAKER I mean, any help will be appreciated. Some time ago, Josh and Sharon (cc'd) fixed the models by having the CDS start at the first M that was in frame with the exon, and wrote a script for that. Is this issue maybe fixed in a newer version of MAKER? How else would you fix it or deal with NCBI genomes people? Thanks, Dario grep -A1 ">" maker_proteins_161026.fasta | grep -v ">" | grep -v "\-\-" | cut -c1 | sort | uniq -c 106 A 33 C 69 D 88 E 53 F 94 G 34 H 86 I 77 K 144 L 28245 M 58 N 72 P 44 Q 95 R 142 S 80 T 114 V 29 W 6 X 53 Y -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From annabel.beichman at gmail.com Thu Mar 30 11:51:36 2017 From: annabel.beichman at gmail.com (Annabel Beichman) Date: Thu, 30 Mar 2017 10:51:36 -0700 Subject: [maker-devel] RepeatMasker masking olfactory receptors Message-ID: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> Hi Carson, I have a question about RepeatMasker within Maker ? I am finding that all class II olfactory receptors (families like OR2, OR5) are being masked by RepeatMasker as ?RTE-BovB? repeats. This leads to them not being annotated by Maker. I don?t expect my species (a mustelid) to have a large number of Bov-B repeats, and when I put the sequences annotated in my genome as RTE-BovB into repbase?s CENSOR only 13 out of 960 sequences have a hit to anything in repbase. If I put those same sequences into NCBI blast, however, they all blast to olfactory receptors. I am finding the same pattern with another related mustelid de novo genome, and took the Ensembl ferret genome and ran it through the same pipeline and am finding a large number of Bov-B repeats there as well, despite there being none in the official annotation of that genome. I used RepeatMasker with all species libraries, plus a custom library from RepeatModeler. Any idea what might be going on? Thanks so much! ~ Annabel From 4urelie.K at gmail.com Thu Mar 30 12:54:07 2017 From: 4urelie.K at gmail.com (Aurelie K) Date: Thu, 30 Mar 2017 12:54:07 -0600 Subject: [maker-devel] RepeatMasker masking olfactory receptors In-Reply-To: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> References: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> Message-ID: Hi Annabel, I would run RM by specifying your (group of) species, using the -s option of Repeat Masker, mostly if you have a custom de novo library. This will limit the cross masking of repeats that have been identified in other species. Cheers, Aurelie On 30 March 2017 at 11:51, Annabel Beichman wrote: > Hi Carson, > I have a question about RepeatMasker within Maker ? > I am finding that all class II olfactory receptors (families like OR2, > OR5) are being masked by RepeatMasker as ?RTE-BovB? repeats. This leads to > them not being annotated by Maker. I don?t expect my species (a mustelid) > to have a large number of Bov-B repeats, and when I put the sequences > annotated in my genome as RTE-BovB into repbase?s CENSOR only 13 out of 960 > sequences have a hit to anything in repbase. If I put those same sequences > into NCBI blast, however, they all blast to olfactory receptors. I am > finding the same pattern with another related mustelid de novo genome, and > took the Ensembl ferret genome and ran it through the same pipeline and am > finding a large number of Bov-B repeats there as well, despite there being > none in the official annotation of that genome. > > I used RepeatMasker with all species libraries, plus a custom library from > RepeatModeler. > > Any idea what might be going on? > > Thanks so much! > > ~ Annabel > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Wed Mar 1 05:30:39 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Wed, 1 Mar 2017 13:30:39 +0100 Subject: [maker-devel] Maker-Error when started with IMPI In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> Message-ID: <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> Hi Carson. Again THANK YOU for your efforts :-) Am 24.02.2017 um 18:30 schrieb Carson Holt: > Specific things. > > 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause problems with other MPI's. OK, I deleted this envirnoment. Not set any more. > 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has to be compiled for the flavor you are using, so make sure you have a separate installation of MAKER for Intel MPI). Also validate that the mpicc and libmpi.h listed during the MAKER install belong to Intel MPI. Don?t just assume they do because you loaded the module. Manually verify the paths during MAKER?s setup. I validated: UC:[kn at uc1n996 bwhpc-examples]$ module list Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) FOR MPICC: UC:[kn at uc1n996 bwhpc-examples]$ type mpicc mpicc is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc FOR LIBMPI: UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., because the SW was compiled and linkes without any errors or missing libs. > 3. The error you got previously should not even be possible with the current version of Intel MPI, > which is why I say that when you called mpiexec, something else (that was not Intel MPI) was launched. > Easy solution is to give the full path of mpiexec in your job, so are not relying on PATH to be unaltered in your job. mpiexec is in the PATH and the right one is/was used, too. MPIXEC: UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec mpiexec is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec UC:[kn at bwhpc-examples]$ > Do not do ?> mpiexec -nc 1 maker > Do this for example ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker OK, so i did: [...] #MSUB -l nodes=1:ppn=1 #MSUB -l mem=20gb [...] echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker [...] > 4. Build and run on the same node for your test. If you build on one node and run on another, you may > be changing your environment in ways you don?t realize that break things. So if you can build and test on > the same node and it works, then it fails when you test it elsewhere, then you have to track down how your > environment is changing. OK I did. Same node: uc1n996 UNFORTUNATELY I GOT THE SAME ERROR: [...] ### Runing Maker example LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libmpi.so OMPI_MCA_mpi_warn_on_fork=0 I_MPI_CPUINFO=proc I_MPI_PMI_LIBRARY=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libpmi.so I_MPI_PIN_DOMAIN=node I_MPI_FABRICS=shm:tcp I_MPI_HYDRA_IFACE=ib0 mpiexec_uc1n342.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n342.localdomain_kn_pop235844); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) [...] > ?Carson tbc. ? :-) THANX -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From rainer.rutka at uni-konstanz.de Wed Mar 1 05:51:05 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Wed, 1 Mar 2017 13:51:05 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> Message-ID: <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> Sorry, sent wrong e-mail :-( IGNORE THE FIRST MAIL I SENT! Am 01.03.2017 um 13:30 schrieb Rainer Rutka: Hi Carson. Again THANK YOU for your efforts :-) Am 24.02.2017 um 18:30 schrieb Carson Holt: > Specific things. > > 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause > problems with other MPI's. OK, I deleted this envirnoment. Not set any more. > 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has > to be compiled for the flavor you are using, so make sure you have a > separate installation of MAKER for Intel MPI). Also validate that the > mpicc and libmpi.h listed during the MAKER install belong to Intel > MPI. Don?t just assume they do because you loaded the module. Manually > verify the paths during MAKER?s setup. I validated: UC:[kn at uc1n996 bwhpc-examples]$ module list Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) FOR MPICC: UC:[kn at uc1n996 bwhpc-examples]$ type mpicc mpicc is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc FOR LIBMPI: UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., because the SW was compiled and linkes without any errors or missing libs. > 3. The error you got previously should not even be possible with the > current version of Intel MPI, > which is why I say that when you called mpiexec, something else (that > was not Intel MPI) was launched. > Easy solution is to give the full path of mpiexec in your job, so are > not relying on PATH to be unaltered in your job. mpiexec is in the PATH and the right one is/was used, too: MPIXEC: UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec mpiexec is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > Do not do ?> mpiexec -nc 1 maker > Do this for example ?> > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > -nc maker OK, so i did: [...] #MSUB -l nodes=1:ppn=1 #MSUB -l mem=20gb [...] echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker [...] > 4. Build and run on the same node for your test. If you build on one > node and run on another, you may > be changing your environment in ways you don?t realize that break > things. So if you can build and test on > the same node and it works, then it fails when you test it elsewhere, > then you have to track down how your > environment is changing. OK I did. Same node: uc1n996 UNFORTUNATELY I GOT THE SAME ERROR: [...] Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) 3) bio/maker/2.31.8_impi ### Display internal Maker/bwHPC environments... MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bin MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bwhpc-examples ### Runing Maker example OMPI_MCA_mpi_warn_on_fork=0 I_MPI_CPUINFO=proc I_MPI_PMI_LIBRARY=/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/lib/libmpi.so I_MPI_PIN_DOMAIN=node I_MPI_FABRICS=shm:tcp I_MPI_HYDRA_IFACE=ib0 mpiexec_uc1n326.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n326.localdomain_kn_pop235844); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) ### Cleaning up files ... removing unnecessary scratch files ... [...] > ?Carson tbc. ? :-) THANX -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Wed Mar 1 13:32:54 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 13:32:54 -0700 Subject: [maker-devel] SOBA statistics of Maker annotation In-Reply-To: <2377C5DD-569C-4248-B458-349D7AEA32F5@ucr.edu> References: <688EB172-FEC8-4995-8AA2-0925AF62201A@ucr.edu> <6551374B-54FF-4047-B7A8-A49327FC0036@gmail.com> <73526BAB-57F8-4A47-AADD-DB6883573EAB@ucr.edu> <2377C5DD-569C-4248-B458-349D7AEA32F5@ucr.edu> Message-ID: <6E776F59-F71F-49F7-872A-A0E404970C7E@gmail.com> Perhaps with the way you are counting sequence from the RepeatMasker report you are double counting for repeats that overlap? MAKER reports the command line it uses as part of its STDERR, so you can manually run any step you want outside of MAKER to evaluate. ?Carson > On Feb 25, 2017, at 10:14 AM, Qihua Liang wrote: > > Thank you Barry and Carson! > > I compared the SOBA statistics of RepeatMasker footprint and the report generated by running RepeatMasker alone, I got 2 different parentage of repeats masked. Running RepeatMasker with myTrained.lib, the repeats masked are 42%. But within Maker GFF3, the percentage of repeats masker is only ~18%. What may cause such difference here? > > Thanks > Qihua > >> On Feb 21, 2017, at 1:34 PM, Carson Holt wrote: >> >> MAKER merges overlapping RepeatMasker results into a single longer feature. >> >> ?Carson >> >> >>> On Feb 20, 2017, at 1:34 PM, Qihua Liang wrote: >>> >>> Hi Carson, >>> >>> Thanks for your reply! Now I understand the minimal length of SOBA analysis of Maker gene models in GFF3. >>> >>> I am also using SOBA to calculate the statistics of another sources in the GFF3 file, and I have found another strange thing about RepeatMasker annotation and footprint percentage. >>> >>> Previously, I ran RepeatMasker outside of Maker once, with my_trained.lib (same as used in Maker), and I had bases masked of ~42% from the output report. >>> In running Maker, I provided both ?model_org=all? and ?rmlib=my_trained.lib?. Under these setting, RepeatMasker should be run twice and the merged results of the twice running will be the output of RepeatMasker in GFF3. I am expecting the bases masked by RepeatMasker in the GFF3 will be more than 42%. >>> >>> But in SOBA calculation, the footprint percentage is only ~18%. Referring to the SOBA paper, footprint is calculated as "non-redundant nucleotide count of all features of a given type?. I assume that when SOBA calculates footprint of RepeatMasker features in GFF3, it should be counting the same as "masked bps" as RepeatMasker itself. >>> >>> When Maker ?combines? the 2 runs of RepeatMasker, is it a merge or an overlapping of 2 RepeatMasker results? >>> Besides, instead of using SOBA, are there any accessory scripts updated in Maker to calculate the statistics of the annotations? >>> >>> Thanks >>> Qihua >>> >>> >>>> On Feb 19, 2017, at 10:05 PM, Carson Holt wrote: >>>> >>>> IN GFF3 the CDS and UTR lengths are actually the merge of all CDSs or UTR features, but SOBA is reporting each part individually which may be causing your confusion. This is because SOBA reports per feature statistics and not merged feature statistics. >>>> >>>> CDS?s do not have to take up entire exons. For example start/stop codons may cross splice sites and be split across exons (very common). The result is that each part of the split CDS becomes a separate feature. As a result SOBA will treat each one separately. So a single bp CDS here is not abnormal, since the remaining part of the CDS continues on the next exon as a separate line. The exact same is true for UTR. >>>> >>>> If you want the merged length of the UTR and CDS, it is bets to pull that info out of the _QI= part of the GFF3 attributes for each mRNA. >>>> >>>> What about single bp exons? Those cannot occur unless you gave an input GFF3 with predictions that have single bp exons. The predictors like SNAP and Augustus just won?t produce them, with one exception. They can potentially produce them for the first/last exon. This is not because the exon is 1 bp, but rather because the predictor only reports the CDS part of the exon. As a result if the stop/start codon may have only 1 bp overlapping that exon, but one you add UTR the exon will extend from that point and will no longer be 1bp in length. But if the UTR never gets added, then you can be left with a partial initial/terminal exon. >>>> >>>> However more than likely what you are seeing is just related to how SOBA reports individual feature line stats as opposed to merged stats for CDS and UTR. >>>> >>>> Thanks, >>>> Carson >>>> >>>>> On Feb 18, 2017, at 9:43 AM, Qihua Liang wrote: >>>>> >>>>> Dear Maker develop team, >>>>> >>>>> I used SOBA website to calculate the statistics of Maker annotation, and I found out the length of some features of Maker, like CDS, exon, 5? and 3?UTR, the minimal length of such features can be as short as 1bp. These are confusing, with such features length of 1bp. When Maker combines different gene models and makes such predictions, how will it accept such abnormal exon/CDS length? And is there any parameters in the bopt.ctl or evm.ctl to avoid such abnormal gene models? >>>>> >>>>> Thanks >>>>> Qihua >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>> >> > From carsonhh at gmail.com Wed Mar 1 13:36:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 13:36:17 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: Message-ID: If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). An example MPI launch command for running on 200 CPUs on a cluster ?> mpiexec -n 200 maker 2> maker_mpi1.error ?Carson > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: > > Hello: > > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? > > The example in the protocol > % maker 2> maker1.error > % maker 2> maker2.error & > % maker 2> maker3.error & > ...... > > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Wed Mar 1 14:09:30 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 1 Mar 2017 16:09:30 -0500 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> Message-ID: Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? Best Quanwei 2017-03-01 15:54 GMT-05:00 Carson Holt : > If you split into separate files, you can use the -g option to select the > input file together with the -base option so all output goes to the same > directory. Because they technically have different input files, this will > avoid file locking issues. You have to use the -dsindex option at the end > to rebuild the datastore index, so it looks like a single job. But that is > one way to get around the issue. > > ?Carson > > > > On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: > > Thank you. But I met some problems with MPI on our server. So now I split > my contigs into several files and annotate those files separately. After I > finish the annotation on each file, I will merge the results. > > Thank you for your explanation! > > Best > Quanwei > > 2017-03-01 15:36 GMT-05:00 Carson Holt : > >> If you submit too many simultaneous, MAKER run then file locks will start >> to collide and one run will slow down the others. You should submit fewer >> simultaneous jobs and instead use MPI (maker must be configured and >> compiled to use MPI). >> >> An example MPI launch command for running on 200 CPUs on a cluster ?> >> mpiexec -n 200 maker 2> maker_mpi1.error >> >> ?Carson >> >> >> >> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang >> wrote: >> > >> > Hello: >> > >> > I am doing genome annotation using Maker on our high performance >> computational cluster (HPC). Due to some issues of MPI, I submitted the >> Maker jobs several times under the same directory to HPC. Followed by the >> example in the protocol (as shown below), when I submit the jobs I make >> them as background processes by "&" except the first one. Is this necessary >> when I submit a job to a HPC? I found it costed much much longer time than >> I expected (according to a testing on a smaller data set). I am not sure >> whether setting the process as background process lead to this issue? >> > >> > The example in the protocol >> > % maker 2> maker1.error >> > % maker 2> maker2.error & >> > % maker 2> maker3.error & >> > ...... >> > >> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 >> of the time that cost for annotation a 50000bp contig? I am using SNAP for >> an inito and RNA-seq assembly and protein sequences as evidence. I have >> more than half contigs shorter than 300bp (whose total length is only about >> 5% of the total length of all contigs), I want to know whether I can save >> about half (or only about 5%) of the time if I ignore those short contigs. >> > >> > Thanks >> > >> > Best >> > Quanwei >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 1 14:10:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 14:10:20 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> Message-ID: <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> That will work. ?Carson > On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: > > Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? > > Best > Quanwei > > 2017-03-01 15:54 GMT-05:00 Carson Holt >: > If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. > > ?Carson > > > >> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang > wrote: >> >> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >> >> Thank you for your explanation! >> >> Best >> Quanwei >> >> 2017-03-01 15:36 GMT-05:00 Carson Holt >: >> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >> >> An example MPI launch command for running on 200 CPUs on a cluster ?> >> mpiexec -n 200 maker 2> maker_mpi1.error >> >> ?Carson >> >> >> >> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang > wrote: >> > >> > Hello: >> > >> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >> > >> > The example in the protocol >> > % maker 2> maker1.error >> > % maker 2> maker2.error & >> > % maker 2> maker3.error & >> > ...... >> > >> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >> > >> > Thanks >> > >> > Best >> > Quanwei >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 1 17:43:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 17:43:30 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> Message-ID: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. The following may or may not work if the first two fail: Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h Also send me this file ?> perl/lib/MAKER/ConfigData.pm Thanks, Carson > On Mar 1, 2017, at 5:51 AM, Rainer Rutka wrote: > > > Sorry, sent wrong e-mail :-( > > IGNORE THE FIRST MAIL I SENT! > > Am 01.03.2017 um 13:30 schrieb Rainer Rutka: > Hi Carson. > Again THANK YOU for your efforts :-) > Am 24.02.2017 um 18:30 schrieb Carson Holt: >> Specific things. >> >> 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause >> problems with other MPI's. > > OK, I deleted this envirnoment. Not set any more. > >> 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has >> to be compiled for the flavor you are using, so make sure you have a >> separate installation of MAKER for Intel MPI). Also validate that the >> mpicc and libmpi.h listed during the MAKER install belong to Intel >> MPI. Don?t just assume they do because you loaded the module. Manually >> verify the paths during MAKER?s setup. > > I validated: > UC:[kn at uc1n996 bwhpc-examples]$ module list > Currently Loaded Modulefiles: > 1) compiler/intel/16.0(default) > 2) mpi/impi/5.1.3-intel-16.0(default) > FOR MPICC: > UC:[kn at uc1n996 bwhpc-examples]$ type mpicc > mpicc is > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc > FOR LIBMPI: > UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 > UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h > Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., > because the SW was compiled and linkes without any errors or missing libs. > >> 3. The error you got previously should not even be possible with the >> current version of Intel MPI, >> which is why I say that when you called mpiexec, something else (that >> was not Intel MPI) was launched. >> Easy solution is to give the full path of mpiexec in your job, so are >> not relying on PATH to be unaltered in your job. > > mpiexec is in the PATH and the right one is/was used, too: > MPIXEC: > UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec > mpiexec is > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > >> Do not do ?> mpiexec -nc 1 maker >> Do this for example ?> >> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec >> -nc maker > OK, so i did: > [...] > #MSUB -l nodes=1:ppn=1 > #MSUB -l mem=20gb > [...] > echo " " > echo "### Runing Maker example" > echo " " > export OMPI_MCA_mpi_warn_on_fork=0 > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > -nc maker > [...] > >> 4. Build and run on the same node for your test. If you build on one >> node and run on another, you may >> be changing your environment in ways you don?t realize that break >> things. So if you can build and test on >> the same node and it works, then it fails when you test it elsewhere, >> then you have to track down how your >> environment is changing. > > OK I did. Same node: uc1n996 > UNFORTUNATELY I GOT THE SAME ERROR: > [...] > Currently Loaded Modulefiles: > 1) compiler/intel/16.0(default) > 2) mpi/impi/5.1.3-intel-16.0(default) > 3) bio/maker/2.31.8_impi > > > ### Display internal Maker/bwHPC environments... > > MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bin > MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bwhpc-examples > > > ### Runing Maker example > OMPI_MCA_mpi_warn_on_fork=0 > I_MPI_CPUINFO=proc > I_MPI_PMI_LIBRARY=/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/lib/libmpi.so > I_MPI_PIN_DOMAIN=node > I_MPI_FABRICS=shm:tcp > I_MPI_HYDRA_IFACE=ib0 > mpiexec_uc1n326.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n326.localdomain_kn_pop235844); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > ### Cleaning up files ... removing unnecessary scratch files ... > [...] > >> ?Carson > tbc. ? :-) > THANX > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From rainer.rutka at uni-konstanz.de Thu Mar 2 01:41:37 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Thu, 2 Mar 2017 09:41:37 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: Hi Carson! Am 02.03.2017 um 01:43 schrieb Carson Holt: > Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h Same error(s). > If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. Yes, they fail. > The following may or may not work if the first two fail: > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello WORKS FINE! > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h WORKS! > Also send me this file ?> perl/lib/MAKER/ConfigData.pm Attached to this mail. > Thanks, > Carson -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: ConfigData.pm Type: application/x-perl Size: 5424 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From rainer.rutka at uni-konstanz.de Thu Mar 2 02:07:07 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Thu, 2 Mar 2017 10:07:07 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: <6cd0a8c5-e6a5-a171-5f80-11d193627aeb@uni-konstanz.de> > The following may or may not work if the first two fail: > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h mpirun, !mpiexec is running, too! -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Thu Mar 2 10:41:35 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Mar 2017 10:41:35 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. Thanks, Carson > On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: > > Hi Carson! > > Am 02.03.2017 um 01:43 schrieb Carson Holt: >> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h > Same error(s). > >> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. > Yes, they fail. > >> The following may or may not work if the first two fail: >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello > WORKS FINE! > >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h > WORKS! > >> Also send me this file ?> perl/lib/MAKER/ConfigData.pm > Attached to this mail. > >> Thanks, >> Carson > > -- > Rainer Rutka > University of Konstanz > Communication, Information, Media Centre (KIM) > * High-Performance-Computing (HPC) > * KIM-Support and -Base-Services > Room: V511 > 78457 Konstanz, Germany > +49 7531 88-5413 > From mnaymik at tgen.org Thu Mar 2 13:05:22 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Thu, 2 Mar 2017 13:05:22 -0700 Subject: [maker-devel] ThrowNullPointerException() Message-ID: I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Now starting the contig!! SeqID: 5239 Length: 1395 #--------------------------------------------------------------------- Error: NCBI C++ Exception: "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 2 13:25:59 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Mar 2017 13:25:59 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: Message-ID: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Try reinstalling blast, or upgrade to a newer version of blast. ?Carson > On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: > > > I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? > > > > MAKER WARNING: All old files will be erased before continuing > > #--------------------------------------------------------------------- > > Now starting the contig!! > > SeqID: 5239 > > Length: 1395 > > #--------------------------------------------------------------------- > > > > > > Error: NCBI C++ Exception: > > "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti > > > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Fri Mar 3 09:48:34 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Fri, 3 Mar 2017 16:48:34 +0000 Subject: [maker-devel] how to deal with Contigs to run maker? In-Reply-To: <2017022815435664227911@cau.edu.cn> References: <2017022815435664227911@cau.edu.cn> Message-ID: <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> Hi Chao, I don?t think merging the contigs is a good idea. Unless you actually know the distances (in basepairs) between the contigs, this could lead to many spurious alignments. I think you should leave them separate in your fasta file for both repeatmodeler, ab-initio training and running maker. If you?re worried about short contigs in your assembly, you can exclude shorter contigs with the min_contig option in the maker_opts control file. ~Daniel On Feb 28, 2017, at 2:43 AM, dcg at cau.edu.cn wrote: Dear sir: After assemblying, I got many contigs and their order in each chromosome. What I have done is merging these contigs into each chromosomes followed by the order, with 100 Ns inserted betwwen each contigs. So that I got chr1 chr2......Then I ran the repeatmodeler, predictor to annotate it. Could my way reach a high-quality result? Should I use all the contigs to mask repeats and practice predictor? Is there any better way to do genome-wide annotation? I'm looking forward to your reply! Best wishes! Chao Chao ________________________________ 2017.02.28 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 3 10:32:15 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 3 Mar 2017 10:32:15 -0700 Subject: [maker-devel] how to deal with Contigs to run maker? In-Reply-To: <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> References: <2017022815435664227911@cau.edu.cn> <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> Message-ID: <7CF3A765-5A93-42B2-AA28-4596CD25A459@gmail.com> I agree. Also a 100bp insert of N?s will essentially be ignored by aligners and predictors. They?ll jump across it as if it was just an intron, resulting in false merges and bad predictions. ?Carson > On Mar 3, 2017, at 9:48 AM, Ence,daniel wrote: > > Hi Chao, I don?t think merging the contigs is a good idea. Unless you actually know the distances (in basepairs) between the contigs, this could lead to many spurious alignments. I think you should leave them separate in your fasta file for both repeatmodeler, ab-initio training and running maker. If you?re worried about short contigs in your assembly, you can exclude shorter contigs with the min_contig option in the maker_opts control file. > > ~Daniel > > >> On Feb 28, 2017, at 2:43 AM, dcg at cau.edu.cn wrote: >> >> Dear sir: >> After assemblying, I got many contigs and their order in each chromosome. >> What I have done is merging these contigs into each chromosomes followed by the order, with 100 Ns inserted betwwen each contigs. So that I got chr1 chr2......Then I ran the repeatmodeler, predictor to annotate it. >> >> Could my way reach a high-quality result? Should I use all the contigs to mask repeats and practice predictor? >> Is there any better way to do genome-wide annotation? >> >> I'm looking forward to your reply! >> Best wishes! >> >> Chao Chao >> 2017.02.28 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Mon Mar 6 01:21:20 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Mon, 6 Mar 2017 09:21:20 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: Hi Carson. Again thank you for your response. But - sorry to say - it's not possible our MPI is corrupt. We have approx. 1.500 users working on our bwUniCluster so far. 95 % of these users use MPI. And: All our other software (see: cis-hpc.uni-konstanz.de ) is running with our implementations of IMPI/OMPI without any issues. :-() Am 02.03.2017 um 18:41 schrieb Carson Holt: > This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello > > All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. > > The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). > > > So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. > > Thanks, > Carson > > > >> On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: >> >> Hi Carson! >> >> Am 02.03.2017 um 01:43 schrieb Carson Holt: >>> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >> Same error(s). >> >>> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. >> Yes, they fail. >> >>> The following may or may not work if the first two fail: >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello >> WORKS FINE! >> >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >> WORKS! >> >>> Also send me this file ?> perl/lib/MAKER/ConfigData.pm >> Attached to this mail. >> >>> Thanks, >>> Carson >> >> -- >> Rainer Rutka >> University of Konstanz >> Communication, Information, Media Centre (KIM) >> * High-Performance-Computing (HPC) >> * KIM-Support and -Base-Services >> Room: V511 >> 78457 Konstanz, Germany >> +49 7531 88-5413 >> > -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Mon Mar 6 07:47:51 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 6 Mar 2017 07:47:51 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: <9B00FB6A-B5F5-4240-AB1E-4CBEEEB63C7F@gmail.com> I was able to replicate the error as so ?> 1. Intel MPI installed on CentOS kernel 6 (MPI works fine) 2. Upgrade to kernel 7 without reinstalling and Intel MPI reports the same error as reported by the user. 3. After recompiling Intel MPI on kernel 7 the error goes away. The proof that there is an issue with your Intel MPI installation is in this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello That command is simply trying to get mpiexec to launch ?echo Hello? internally. And it failed. It?s as simple as that. Thanks, Carson > On Mar 6, 2017, at 1:21 AM, Rainer Rutka wrote: > > > Hi Carson. > > Again thank you for your response. > > But - sorry to say - it's not possible our MPI is corrupt. > We have approx. 1.500 users working on our bwUniCluster so far. 95 % > of these users use MPI. And: All our other software (see: > > cis-hpc.uni-konstanz.de ) > > is running with our implementations of IMPI/OMPI without any > issues. > > :-() > > > Am 02.03.2017 um 18:41 schrieb Carson Holt: >> This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >> >> All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. >> >> The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). >> >> >> So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. >> >> Thanks, >> Carson >> >> >> >>> On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: >>> >>> Hi Carson! >>> >>> Am 02.03.2017 um 01:43 schrieb Carson Holt: >>>> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >>> Same error(s). >>> >>>> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. >>> Yes, they fail. >>> >>>> The following may or may not work if the first two fail: >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello >>> WORKS FINE! >>> >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >>> WORKS! >>> >>>> Also send me this file ?> perl/lib/MAKER/ConfigData.pm >>> Attached to this mail. >>> >>>> Thanks, >>>> Carson >>> >>> -- >>> Rainer Rutka >>> University of Konstanz >>> Communication, Information, Media Centre (KIM) >>> * High-Performance-Computing (HPC) >>> * KIM-Support and -Base-Services >>> Room: V511 >>> 78457 Konstanz, Germany >>> +49 7531 88-5413 >>> >> > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From dussert.yann at gmail.com Mon Mar 6 09:51:59 2017 From: dussert.yann at gmail.com (YannDussert) Date: Mon, 6 Mar 2017 17:51:59 +0100 Subject: [maker-devel] Differences in non_overlapping protein file between runs Message-ID: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Hello, First, thank you for developing MAKER, this is a great annotation tool! I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? Thank you in advance for your answer. Best regards, Yann From dcg at cau.edu.cn Sun Mar 5 04:26:59 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Sun, 5 Mar 2017 19:26:59 +0800 Subject: [maker-devel] For help about masking repeats before annotation Message-ID: <2017030519265949065818@cau.edu.cn> Dear sir: Before the maker opeations, I do repeat masking first on my contigs. However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : NOTE: RepeatScout did not return any models. Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? Hope to hear from you. Best wishes! Chao Chao 2017.03.05 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Mon Mar 6 05:24:17 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Mon, 6 Mar 2017 20:24:17 +0800 Subject: [maker-devel] How to merge the annotation results into chromosomes? Message-ID: <2017030620241723514513@cau.edu.cn> Dear sir: Hello, I am doing my utmost to sdudy on annotation now. However, I have been confused on results handlng recently. After alignment, practice and curation, we can get good gene model and merge them by gff_merge and fasta_merge. But how can I merge them into different chromosomes like Homo_sapiens.GRCh38.87.chromosome.11.gff3.gz? I don't just want results of different contigs. I'm looking forward to your reply. Thanks a lot! Best wishes! Chao Chao 2017.03.06 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucys-world at mailbox.org Mon Mar 6 07:40:33 2017 From: lucys-world at mailbox.org (lucys-world at mailbox.org) Date: Mon, 6 Mar 2017 15:40:33 +0100 (CET) Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP Message-ID: <850873370.6534.1488811234072@office.mailbox.org> Dear maker-devel group, I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. after two maker runs I wanted to create a new HMM for the third maker run, but the command fathom genome.ann genoma.dna -gene-stats resulted in 0 genes. What have I done so far: * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 * I was able to create an HMM based on all merged *.gff But these were not many: o out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: o 280 sequences 0.458676 avg GC fraction (min=0.338014 max=0.708052) 7445 genes (plus=3192 minus=4253) 1621 (0.217730) single-exon 5824 (0.782270) multi-exon 168.412018 mean exon (min=1 max=5224) 1464.349243 mean intron (min=30 max=41197) * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. o the gene-stats for the output of the second maker run are: o 282 sequences 0.473125 avg GC fraction (min=0.338014 max=0.725131) 0 genes (plus=0 minus=0) 0 (-nan) single-exon 0 (-nan) multi-exon -nan mean exon (min=2147483647 max=0) -nan mean intron (min=2147483647 max=0) Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) Thank you for your time and help kind regards Lucy -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Mon Mar 6 10:11:57 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Mon, 6 Mar 2017 17:11:57 +0000 Subject: [maker-devel] How to merge the annotation results into chromosomes? In-Reply-To: <2017030620241723514513@cau.edu.cn> References: <2017030620241723514513@cau.edu.cn> Message-ID: <45D1390D-212D-42A4-9819-C0045601B013@mail.ufl.edu> Hi, Do you have data that can precisely place each of your contigs in their position on the chromosome? Without that, this isn?t even possible, since a gff3 file with the chromosomes instead of the contigs requires each contig?s position in the chromosome. And in any case, I don?t think there is a script in the maker tools that does what you?re asking. Maybe someone else has made a script to do that. ~Daniel On Mar 6, 2017, at 7:24 AM, dcg at cau.edu.cn wrote: Dear sir: Hello, I am doing my utmost to sdudy on annotation now. However, I have been confused on results handlng recently. After alignment, practice and curation, we can get good gene model and merge them by gff_merge and fasta_merge. But how can I merge them into different chromosomes like Homo_sapiens.GRCh38.87.chromosome.11.gff3.gz? I don't just want results of different contigs. I'm looking forward to your reply. Thanks a lot! Best wishes! Chao Chao ________________________________ 2017.03.06 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Mon Mar 6 10:15:07 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Mon, 6 Mar 2017 17:15:07 +0000 Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <850873370.6534.1488811234072@office.mailbox.org> References: <850873370.6534.1488811234072@office.mailbox.org> Message-ID: <970801D9-536E-494C-B5C7-F5F72125FAFC@mail.ufl.edu> Hi Lucy, What were your settings for the second training run? Did you leave protein2genome=1? ~Daniel On Mar 6, 2017, at 9:40 AM, lucys-world at mailbox.org wrote: Dear maker-devel group, I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. after two maker runs I wanted to create a new HMM for the third maker run, but the command fathom genome.ann genoma.dna -gene-stats resulted in 0 genes. What have I done so far: * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 * I was able to create an HMM based on all merged *.gff But these were not many: * out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: * 280 sequences 0.458676 avg GC fraction (min=0.338014 max=0.708052) 7445 genes (plus=3192 minus=4253) 1621 (0.217730) single-exon 5824 (0.782270) multi-exon 168.412018 mean exon (min=1 max=5224) 1464.349243 mean intron (min=30 max=41197) * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. * the gene-stats for the output of the second maker run are: * 282 sequences 0.473125 avg GC fraction (min=0.338014 max=0.725131) 0 genes (plus=0 minus=0) 0 (-nan) single-exon 0 (-nan) multi-exon -nan mean exon (min=2147483647 max=0) -nan mean intron (min=2147483647 max=0) Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) Thank you for your time and help kind regards Lucy _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 6 12:48:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 6 Mar 2017 12:48:49 -0700 Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <850873370.6534.1488811234072@office.mailbox.org> References: <850873370.6534.1488811234072@office.mailbox.org> Message-ID: <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> It looks like you have no genes to train with. So you did something wrong on your second run. Either no gene predictor was running or you provided no evidence for the predictor, so you produced no models. ?Carson > On Mar 6, 2017, at 7:40 AM, lucys-world at mailbox.org wrote: > > Dear maker-devel group, > > > > I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. > > after two maker runs I wanted to create a new HMM for the third maker run, but the command > > > > fathom genome.ann genoma.dna -gene-stats > > > > resulted in 0 genes. > > > > What have I done so far: > > for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 > > > I was able to create an HMM based on all merged *.gff But these were not many: > out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: > 280 sequences > 0.458676 avg GC fraction (min=0.338014 max=0.708052) > 7445 genes (plus=3192 minus=4253) > 1621 (0.217730) single-exon > 5824 (0.782270) multi-exon > 168.412018 mean exon (min=1 max=5224) > 1464.349243 mean intron (min=30 max=41197) > > > For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. > the gene-stats for the output of the second maker run are: > 282 sequences > 0.473125 avg GC fraction (min=0.338014 max=0.725131) > 0 genes (plus=0 minus=0) > 0 (-nan) single-exon > 0 (-nan) multi-exon > -nan mean exon (min=2147483647 max=0) > -nan mean intron (min=2147483647 max=0) > > > Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) > > > > Thank you for your time and help > > kind regards > > Lucy > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Mar 7 08:14:11 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 7 Mar 2017 10:14:11 -0500 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Hi Carson: I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. Thanks 2017-03-01 16:10 GMT-05:00 Carson Holt : > That will work. > > ?Carson > > On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: > > Thank you. I have submit my jobs to our server. What I plan to do is like > this: (1) split contigs into 50 files; (2) for each contig file, I > collected the annotation into gff and protein sequences into fasta format; > (3) manually merge the 50 gff files and protein sequences files. Is what I > am doing also correct? > > Best > Quanwei > > 2017-03-01 15:54 GMT-05:00 Carson Holt : > >> If you split into separate files, you can use the -g option to select the >> input file together with the -base option so all output goes to the same >> directory. Because they technically have different input files, this will >> avoid file locking issues. You have to use the -dsindex option at the end >> to rebuild the datastore index, so it looks like a single job. But that is >> one way to get around the issue. >> >> ?Carson >> >> >> >> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >> >> Thank you. But I met some problems with MPI on our server. So now I >> split my contigs into several files and annotate those files separately. >> After I finish the annotation on each file, I will merge the results. >> >> Thank you for your explanation! >> >> Best >> Quanwei >> >> 2017-03-01 15:36 GMT-05:00 Carson Holt : >> >>> If you submit too many simultaneous, MAKER run then file locks will >>> start to collide and one run will slow down the others. You should submit >>> fewer simultaneous jobs and instead use MPI (maker must be configured and >>> compiled to use MPI). >>> >>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>> mpiexec -n 200 maker 2> maker_mpi1.error >>> >>> ?Carson >>> >>> >>> >>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang >>> wrote: >>> > >>> > Hello: >>> > >>> > I am doing genome annotation using Maker on our high performance >>> computational cluster (HPC). Due to some issues of MPI, I submitted the >>> Maker jobs several times under the same directory to HPC. Followed by the >>> example in the protocol (as shown below), when I submit the jobs I make >>> them as background processes by "&" except the first one. Is this necessary >>> when I submit a job to a HPC? I found it costed much much longer time than >>> I expected (according to a testing on a smaller data set). I am not sure >>> whether setting the process as background process lead to this issue? >>> > >>> > The example in the protocol >>> > % maker 2> maker1.error >>> > % maker 2> maker2.error & >>> > % maker 2> maker3.error & >>> > ...... >>> > >>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 >>> of the time that cost for annotation a 50000bp contig? I am using SNAP for >>> an inito and RNA-seq assembly and protein sequences as evidence. I have >>> more than half contigs shorter than 300bp (whose total length is only about >>> 5% of the total length of all contigs), I want to know whether I can save >>> about half (or only about 5%) of the time if I ignore those short contigs. >>> > >>> > Thanks >>> > >>> > Best >>> > Quanwei >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>> ell-lab.org >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 7 08:35:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 7 Mar 2017 08:35:42 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Use gff3_merge again without the -d option. Just give it all 50 files. --Carson Sent from my iPhone > On Mar 7, 2017, at 8:14 AM, Quanwei Zhang wrote: > > Hi Carson: > > I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? > (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. > (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. > > Thanks > > > > 2017-03-01 16:10 GMT-05:00 Carson Holt : >> That will work. >> >> ?Carson >> >>> On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: >>> >>> Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? >>> >>> Best >>> Quanwei >>> >>> 2017-03-01 15:54 GMT-05:00 Carson Holt : >>>> If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. >>>> >>>> ?Carson >>>> >>>> >>>> >>>>> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >>>>> >>>>> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >>>>> >>>>> Thank you for your explanation! >>>>> >>>>> Best >>>>> Quanwei >>>>> >>>>> 2017-03-01 15:36 GMT-05:00 Carson Holt : >>>>>> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >>>>>> >>>>>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>>>>> mpiexec -n 200 maker 2> maker_mpi1.error >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: >>>>>> > >>>>>> > Hello: >>>>>> > >>>>>> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >>>>>> > >>>>>> > The example in the protocol >>>>>> > % maker 2> maker1.error >>>>>> > % maker 2> maker2.error & >>>>>> > % maker 2> maker3.error & >>>>>> > ...... >>>>>> > >>>>>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >>>>>> > >>>>>> > Thanks >>>>>> > >>>>>> > Best >>>>>> > Quanwei >>>>>> > _______________________________________________ >>>>>> > maker-devel mailing list >>>>>> > maker-devel at box290.bluehost.com >>>>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 7 08:35:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 7 Mar 2017 08:35:42 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Use gff3_merge again without the -d option. Just give it all 50 files. --Carson Sent from my iPhone > On Mar 7, 2017, at 8:14 AM, Quanwei Zhang wrote: > > Hi Carson: > > I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? > (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. > (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. > > Thanks > > > > 2017-03-01 16:10 GMT-05:00 Carson Holt : >> That will work. >> >> ?Carson >> >>> On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: >>> >>> Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? >>> >>> Best >>> Quanwei >>> >>> 2017-03-01 15:54 GMT-05:00 Carson Holt : >>>> If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. >>>> >>>> ?Carson >>>> >>>> >>>> >>>>> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >>>>> >>>>> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >>>>> >>>>> Thank you for your explanation! >>>>> >>>>> Best >>>>> Quanwei >>>>> >>>>> 2017-03-01 15:36 GMT-05:00 Carson Holt : >>>>>> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >>>>>> >>>>>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>>>>> mpiexec -n 200 maker 2> maker_mpi1.error >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: >>>>>> > >>>>>> > Hello: >>>>>> > >>>>>> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >>>>>> > >>>>>> > The example in the protocol >>>>>> > % maker 2> maker1.error >>>>>> > % maker 2> maker2.error & >>>>>> > % maker 2> maker3.error & >>>>>> > ...... >>>>>> > >>>>>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >>>>>> > >>>>>> > Thanks >>>>>> > >>>>>> > Best >>>>>> > Quanwei >>>>>> > _______________________________________________ >>>>>> > maker-devel mailing list >>>>>> > maker-devel at box290.bluehost.com >>>>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisi.hahni at gmail.com Tue Mar 7 17:51:00 2017 From: chrisi.hahni at gmail.com (Christoph Hahn) Date: Wed, 8 Mar 2017 01:51:00 +0100 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> Message-ID: <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> Hi MAKER community, I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! cheers, Christoph On 10/02/2015 17:56, Carson Holt wrote: > I ran a few est2genome runs with a cufflinks file i just generated and > did not get any issues for EST based gene models. > > I?d like to at least have your test set to see if I can duplicate what > you are seeing. > > Use this to upload the job files then I can just run it from my server > here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > ?Carson > > >> On Feb 3, 2015, at 11:13 AM, Jason Gallant > > wrote: >> >> Hi Folks, >> >> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been >> checking the output files, and have noticed that none of my RNAseq >> data was incorporated on the run. I used Cufflinks to perform >> alignments of libraries from several tissues, ran the accessory >> script cufflinks2gff3 for each tissue, then concatenated the >> resulting gff3 files. I even ran the accessory script gff3merge to >> check that the resulting file was properly formatted. >> >> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only >> get protein2genome and repeatmasker evidence in my resulting maker >> gff3 file, and the genes predicted by these. Is there another option >> that I need to enable in order to use my est_gff file? I?m trying to >> get a set of genes to train the predictors for my next step. >> >> Any help would (as always) be greatly appreciated! >> >> Best, >> Jason Gallant >> >> ? >> Dr. Jason R. Gallant >> Assistant Professor >> Room 38 Natural Sciences >> Department of Zoology >> Michigan State University >> East Lansing, MI 48824 >> jgallant at msu.edu >> office: 517-884-7756 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Thu Mar 9 02:36:27 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 09:36:27 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing Message-ID: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> Hi all, I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. Thank you. Sincerely, Ole From carsonhh at gmail.com Thu Mar 9 10:52:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 10:52:30 -0700 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Message-ID: My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. ?Carson > On Mar 6, 2017, at 9:51 AM, YannDussert wrote: > > Hello, > > First, thank you for developing MAKER, this is a great annotation tool! > > I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. > > After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? > > Thank you in advance for your answer. > > Best regards, > Yann > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 9 11:39:11 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 11:39:11 -0700 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> Message-ID: <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> Jason never responded back to this one or uploaded his file to test. He probably figured it out off list. My guess is that your results are too fragmented to build a model that can pass filtering thresholds with. If you want I can take a look. You can upload all files for a test job here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi ?Carson > On Mar 7, 2017, at 5:51 PM, Christoph Hahn wrote: > > Hi MAKER community, > > I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. > > I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! > > cheers, > Christoph > > On 10/02/2015 17:56, Carson Holt wrote: >> I ran a few est2genome runs with a cufflinks file i just generated and did not get any issues for EST based gene models. >> >> I?d like to at least have your test set to see if I can duplicate what you are seeing. >> >> Use this to upload the job files then I can just run it from my server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> ?Carson >> >> >>> On Feb 3, 2015, at 11:13 AM, Jason Gallant > wrote: >>> >>> Hi Folks, >>> >>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been checking the output files, and have noticed that none of my RNAseq data was incorporated on the run. I used Cufflinks to perform alignments of libraries from several tissues, ran the accessory script cufflinks2gff3 for each tissue, then concatenated the resulting gff3 files. I even ran the accessory script gff3merge to check that the resulting file was properly formatted. >>> >>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only get protein2genome and repeatmasker evidence in my resulting maker gff3 file, and the genes predicted by these. Is there another option that I need to enable in order to use my est_gff file? I?m trying to get a set of genes to train the predictors for my next step. >>> >>> Any help would (as always) be greatly appreciated! >>> >>> Best, >>> Jason Gallant >>> >>> ? >>> Dr. Jason R. Gallant >>> Assistant Professor >>> Room 38 Natural Sciences >>> Department of Zoology >>> Michigan State University >>> East Lansing, MI 48824 >>> jgallant at msu.edu >>> office: 517-884-7756 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 9 11:51:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 11:51:25 -0700 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> Message-ID: <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure ?Carson > On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: > > Hi all, > I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. > > First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? > > I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. > > Thank you. > > Sincerely, > Ole > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucys-world at mailbox.org Tue Mar 7 01:39:40 2017 From: lucys-world at mailbox.org (lucys-world at mailbox.org) Date: Tue, 7 Mar 2017 09:39:40 +0100 (CET) Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> References: <850873370.6534.1488811234072@office.mailbox.org> <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> Message-ID: <1407048207.7112.1488875981292@office.mailbox.org> Hallo Carson, hello Daniel, thank you for your fast reply and help. To Daniels question: Yes unfortunately I had protein2genome=1 in all runs To Carson: After reading a lot through the forum I figured that I had a mistake in understanding an initio gene prediction. I thought one had to perform 3 maker run in total. One training run and then two maker runs for annotation. But now I think there are only two maker in to perform in total (one training and then one annotation run) is that correct? So after my first run I created an HMM based on the first gene-stats (with 7445 genes) and performed my second run with this HMM. Then I tried to create a new HMM based on my second run output. I think that is not necessary since the output of the second run should be my annotated genome? I think I have to redo my maker runs and for that have to questions regarding the maker_opts.ctl: 1. Training run: For that I have to give maker my genome, my evidence (in my Case Busco and Swissport data sets) and set protein2genome=1 . Since that is my only evidence I don't change anything else? (I don't add anything in the gene prediction paragraph?) 2. Annotation run: With the gff output of the training run I create my own HMM from SNAP. In the maker_opts.ctl I then add for this annotation run my SNAP-HMM and set AugustusSpecies on the closest related species (as recommended in the Augustus manual), is that correct? Do I give also my Protein evidence as I did in the Trainingsrun? Thank you very much for your time and help with that ! - Lucy > Carson Holt hat am 6. M?rz 2017 um 20:48 geschrieben: > > It looks like you have no genes to train with. So you did something wrong on your second run. Either no gene predictor was running or you provided no evidence for the predictor, so you produced no models. > > ?Carson > > > > > > On Mar 6, 2017, at 7:40 AM, lucys-world at mailbox.org mailto:lucys-world at mailbox.org wrote: > > > > > > Dear maker-devel group, > > > > > > I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. > > > > after two maker runs I wanted to create a new HMM for the third maker run, but the command > > > > > > fathom genome.ann genoma.dna -gene-stats > > > > > > resulted in 0 genes. > > > > > > What have I done so far: > > > > * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 > > > > > > * I was able to create an HMM based on all merged *.gff But these were not many: > > o out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: > > o 280 sequences > > 0.458676 avg GC fraction (min=0.338014 max=0.708052) > > 7445 genes (plus=3192 minus=4253) > > 1621 (0.217730) single-exon > > 5824 (0.782270) multi-exon > > 168.412018 mean exon (min=1 max=5224) > > 1464.349243 mean intron (min=30 max=41197) > > > > > > * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. > > o the gene-stats for the output of the second maker run are: > > o 282 sequences > > 0.473125 avg GC fraction (min=0.338014 max=0.725131) > > 0 genes (plus=0 minus=0) > > 0 (-nan) single-exon > > 0 (-nan) multi-exon > > -nan mean exon (min=2147483647 max=0) > > -nan mean intron (min=2147483647 max=0) > > > > > > Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) > > > > > > Thank you for your time and help > > > > kind regards > > > > Lucy > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com mailto:maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Thu Mar 9 12:42:31 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:42:31 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> Message-ID: <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> Hi Carson. In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? So I guess this version is not available, or maybe they meant 3.0beta1 or something. ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. Thank you. Ole > On 09 Mar 2017, at 19:51, Carson Holt wrote: > > Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. > > It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). > > I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. > > It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure > > ?Carson > > > >> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >> >> Hi all, >> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >> >> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >> >> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >> >> Thank you. >> >> Sincerely, >> Ole >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Thu Mar 9 12:50:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 12:50:10 -0700 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> Message-ID: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. ?Carson > On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: > > Hi Carson. > > In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? > > So I guess this version is not available, or maybe they meant 3.0beta1 or something. > > ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. > > Thank you. > > Ole > >> On 09 Mar 2017, at 19:51, Carson Holt wrote: >> >> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >> >> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >> >> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >> >> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >> >> ?Carson >> >> >> >>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>> >>> Hi all, >>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>> >>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>> >>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>> >>> Thank you. >>> >>> Sincerely, >>> Ole >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From o.k.torresen at ibv.uio.no Thu Mar 9 12:55:00 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:55:00 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> Message-ID: Ah, thank you. That explains it. Ole > On 09 Mar 2017, at 20:50, Carson Holt wrote: > > My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. > > ?Carson > > >> On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: >> >> Hi Carson. >> >> In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? >> >> So I guess this version is not available, or maybe they meant 3.0beta1 or something. >> >> ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. >> >> Thank you. >> >> Ole >> >>> On 09 Mar 2017, at 19:51, Carson Holt wrote: >>> >>> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >>> >>> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >>> >>> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >>> >>> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >>> >>> ?Carson >>> >>> >>> >>>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>>> >>>> Hi all, >>>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>>> >>>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>>> >>>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> Ole >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From o.k.torresen at ibv.uio.no Thu Mar 9 12:59:35 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:59:35 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> Message-ID: <0B73432A-E0EE-4983-8314-E8A94AADA74F@ibv.uio.no> Ah, thank you. That explains it. Ole > On 09 Mar 2017, at 20:50, Carson Holt wrote: > > My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. > > ?Carson > > >> On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: >> >> Hi Carson. >> >> In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? >> >> So I guess this version is not available, or maybe they meant 3.0beta1 or something. >> >> ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. >> >> Thank you. >> >> Ole >> >>> On 09 Mar 2017, at 19:51, Carson Holt wrote: >>> >>> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >>> >>> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >>> >>> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >>> >>> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >>> >>> ?Carson >>> >>> >>> >>>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>>> >>>> Hi all, >>>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>>> >>>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>>> >>>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> Ole >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From chrisi.hahni at gmail.com Fri Mar 10 01:50:52 2017 From: chrisi.hahni at gmail.com (Christoph Hahn) Date: Fri, 10 Mar 2017 09:50:52 +0100 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> Message-ID: <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> Dear Carson, Thanks for getting in touch! I actually managed in the end. I converted the gtf I had from cufflinks to gff3 via the script 'gtf2gff.pl' from augustus and then used the script 'gffGetmRNA.pl' again from augustus to extract the mRNA in fasta. This file I fed to MAKER via the 'est=' route and now I get plenty of est2genome evidence in the maker result. So the problem seems to be limited to the route 'est_gff=', allthough there is no error message whatsoever the est2genome routine seems to never be triggered. I'd still be happy to upload my data (the cufflinks gff, the genome fasta, anything else?) if you want to try to reproduce the problem. Let me know! btw I seem to be unable to create a new topic or respond to topics via google groups. Is the list closed or the access restricted somehow. I only managed by responding to Jason's mail which I still had in my inbox directly via my gmail. Thanks! cheers, Christoph On 09/03/2017 19:39, Carson Holt wrote: > Jason never responded back to this one or uploaded his file to test. > He probably figured it out off list. My guess is that your results are > too fragmented to build a model that can pass filtering thresholds with. > > If you want I can take a look. You can upload all files for a test job > here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > ?Carson > > > >> On Mar 7, 2017, at 5:51 PM, Christoph Hahn > > wrote: >> >> Hi MAKER community, >> >> I think I am seeing the same issue that Jason has reported. ran >> cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER >> via 'est_gff=' with 'est2genome=1'. In the resulting gff file from >> maker I only get protein2genome and repeatmasker evidence. If I do a >> search in the maker log est2genome never comes up. Tried to extract >> the cufflinks results as fasta and feed to MAKER via 'est='. Still no >> indication that the evidence is used. >> >> I am using MAKER 2.31.8. Any help would be much appreciated! Thanks >> in advance for your time! >> >> cheers, >> Christoph >> >> On 10/02/2015 17:56, Carson Holt wrote: >>> I ran a few est2genome runs with a cufflinks file i just generated >>> and did not get any issues for EST based gene models. >>> >>> I?d like to at least have your test set to see if I can duplicate >>> what you are seeing. >>> >>> Use this to upload the job files then I can just run it from my >>> server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> ?Carson >>> >>> >>>> On Feb 3, 2015, at 11:13 AM, Jason Gallant >>> > wrote: >>>> >>>> Hi Folks, >>>> >>>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been >>>> checking the output files, and have noticed that none of my RNAseq >>>> data was incorporated on the run. I used Cufflinks to perform >>>> alignments of libraries from several tissues, ran the accessory >>>> script cufflinks2gff3 for each tissue, then concatenated the >>>> resulting gff3 files. I even ran the accessory script gff3merge to >>>> check that the resulting file was properly formatted. >>>> >>>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only >>>> get protein2genome and repeatmasker evidence in my resulting maker >>>> gff3 file, and the genes predicted by these. Is there another >>>> option that I need to enable in order to use my est_gff file? I?m >>>> trying to get a set of genes to train the predictors for my next step. >>>> >>>> Any help would (as always) be greatly appreciated! >>>> >>>> Best, >>>> Jason Gallant >>>> >>>> ? >>>> Dr. Jason R. Gallant >>>> Assistant Professor >>>> Room 38 Natural Sciences >>>> Department of Zoology >>>> Michigan State University >>>> East Lansing, MI 48824 >>>> jgallant at msu.edu >>>> office: 517-884-7756 >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dussert.yann at gmail.com Fri Mar 10 03:53:36 2017 From: dussert.yann at gmail.com (YannDussert) Date: Fri, 10 Mar 2017 11:53:36 +0100 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Message-ID: <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> Hi, Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round. I can't see any problem with it, it looks like this: Plvit001 augustus_masked match 66626 70338 0.85 + . ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 Plvit001 augustus_masked match_part 66626 67586 0.85 + . ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961 Plvit001 augustus match 66626 70338 1 + . ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1 Plvit001 augustus match_part 66626 70096 1 + . ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471 Plvit001 augustus_masked match_part 68166 68486 0.85 + . ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321 Plvit001 augustus_masked match_part 69504 70096 0.85 + . ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593 Plvit001 augustus_masked match_part 70174 70338 0.85 + . ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165 Best regards, Yann On 09/03/2017 18:52, Carson Holt wrote: > My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. > > ?Carson > > >> On Mar 6, 2017, at 9:51 AM, YannDussert wrote: >> >> Hello, >> >> First, thank you for developing MAKER, this is a great annotation tool! >> >> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. >> >> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? >> >> Thank you in advance for your answer. >> >> Best regards, >> Yann >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ereboperezsilva at gmail.com Fri Mar 10 04:05:29 2017 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Fri, 10 Mar 2017 12:05:29 +0100 Subject: [maker-devel] ERROR: Chunk failed Message-ID: Hi! I'm having some trouble understanding the ERROR I'm receiving. Recently I've set up a new machine to work annotate a genome (around 2 Gb big) using Maker. We mounted a new disk of 1Tb and loaded there the files of a uncomplete run of annotation (we started it in a different machine and move it to this one, which had more precessing power). Apparently everything was ok, until somewhen yesterday we received the next ERROR: examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig4633 We are running 16 jobs of maker at the same time, on the unsplitted genome. We checked and "df" command returned that only 7% os the mounted disk was used. So the space does not appear to be the problem... Why that error then? Thanks for the help. Jos? Mar?a Gonz?lez P?rez-Silva. PhD student at Universidad de Oviedo. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ereboperezsilva at gmail.com Fri Mar 10 10:21:38 2017 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Fri, 10 Mar 2017 18:21:38 +0100 Subject: [maker-devel] Maker ERROR Message-ID: Hi, I wrote early this day, in reference to a problem of (apparently) space. After I deleted some unnecesary files (despite having plenty of storage left), I killed all the processes, and set 'clean_try=1' as recomended in this post . Before re-running the processes, we checked that there were no limitation over the size of a directory or something similar. After re-running, at first, all seemed correct, but when I re-checked some time after, I found out a lot of contigs with the status FAILED without folder specification in the '_master_datastore_index.log', looking like: Contig480 FAILED > Contig496 FAILED Contig512 FAILED Contig528 FAILED Contig544 FAILED Contig560 FAILED? But checking the 'nohub.out' of every proccess (16 in total, as the machine has 16 cores), I notice that each run is, from time to time, processing the contig correctly. So, after several (a lot) of FAILED contigs, it process one correctly. As said in the previous email, the ERROR dispolayed in the nohup.out is (including the last part of a processed contig at the beguinning): ? > #--------- command -------------# Widget::blastx: /usr/bin/blastall -p blastx -d > /data/ge/tmp/maker_VfDQQU/hsap_ensembl%2Efa.mpi.10.6 -i > /data/ge/tmp/maker_VfDQQU/0/Contig20.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y > 500000000 -a 4 -U -F T -I T -o > /data/ge/round3/cg.maker.output/cg_datastore/56/AC/Contig20//theVoid.Contig20/0/Contig20.0.hsap_ensembl%2Efa.blastx.temp_dir/hsap_ensembl%2Efa.mpi.10.6.blastx #-------------------------------# deleted:511 hits doing blastx of proteins open3: fork failed: Cannot allocate memory at > /home/jmgps/software/maker/bin/../lib/File/NFSLock.pm line 1037. --> rank=NA, hostname=Planarian2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig20 > ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig20 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig22 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig24 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig26 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig28? I'm totally lost here, I think it is still processing contigs, but the FAILED attemps slow down the whole process, and we are in a hurry due to the maintenance of the machine. And I can't understand the source of the ERROR. I will be more than happy to provide more details about the problem, if requested. Thanks a lot for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 10 10:34:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 Mar 2017 10:34:34 -0700 Subject: [maker-devel] Maker ERROR In-Reply-To: References: Message-ID: Several things. 1. MAKER does a lot of it?s work in a temporary directory (usually /tmp). This directory must be locally mounted and cannot be a network mounted location. If this location is full you can get issues. 2. MAKER needs at least 1GB of RAM per process (2-3GB is safer), so if you don?t have enough RAM you may need to run fewer processes (with MPI multiply whatever you supplied to the mpiexec -n flag by 1GB). 3. If you are launching MAKER multiple times as opposed to launching once via MPI, you will exacerbate the above limitations as well as open up IO limitations. MAKER can and does saturate IO when run multiple times simultaneously (this is especially true for network mounted locations). If you run via MPI you can greatly reduce IO, so make sure you are using MPI and not just launching MAKER multiple times. If you absolutely have to start multiple jobs, you can reduce IO somewhat by splitting the input fasta into pieces (use fasta_tool). Give a separate piece to each job via maker?s -g flag, and set -base so all results from all jobs get written to the same location. Then each job can avoid multiple file locks that would have been encountered by sharing input. Note that you must rebuild the datastore index using 'maker -dsindex? when all jobs complete. ?Carson > On Mar 10, 2017, at 10:21 AM, Jos? M? G. Perez-Silva wrote: > > Hi, > > I wrote early this day, in reference to a problem of (apparently) space. After I deleted some unnecesary files (despite having plenty of storage left), I killed all the processes, and set 'clean_try=1' as recomended in this post . Before re-running the processes, we checked that there were no limitation over the size of a directory or something similar. > > After re-running, at first, all seemed correct, but when I re-checked some time after, I found out a lot of contigs with the status FAILED without folder specification in the '_master_datastore_index.log', looking like: > > Contig480 FAILED > Contig496 FAILED > Contig512 FAILED > Contig528 FAILED > Contig544 FAILED > Contig560 FAILED? > > But checking the 'nohub.out' of every proccess (16 in total, as the machine has 16 cores), I notice that each run is, from time to time, processing the contig correctly. So, after several (a lot) of FAILED contigs, it process one correctly. As said in the previous email, the ERROR dispolayed in the nohup.out is (including the last part of a processed contig at the beguinning): > > ?#--------- command -------------# > Widget::blastx: > /usr/bin/blastall -p blastx -d /data/ge/tmp/maker_VfDQQU/hsap_ensembl%2Efa.mpi.10.6 -i /data/ge/tmp/maker_VfDQQU/0/Contig20.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 4 -U -F T -I T -o /data/ge/round3/cg.maker.output/cg_datastore/56/AC/Contig20//theVoid.Contig20/0/Contig20.0.hsap_ensembl%2Efa.blastx.temp_dir/hsap_ensembl%2Efa.mpi.10.6.blastx > #-------------------------------# > deleted:511 hits > doing blastx of proteins > open3: fork failed: Cannot allocate memory at /home/jmgps/software/maker/bin/../lib/File/NFSLock.pm line 1037. > --> rank=NA, hostname=Planarian2 > ERROR: Failed while doing blastx of proteins > ERROR: Chunk failed at level:8, tier_type:3 > FAILED CONTIG:Contig20 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:Contig20 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig22 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig24 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig26 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig28? > > I'm totally lost here, I think it is still processing contigs, but the FAILED attemps slow down the whole process, and we are in a hurry due to the maintenance of the machine. And I can't understand the source of the ERROR. > > I will be more than happy to provide more details about the problem, if requested. > > Thanks a lot for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 10:16:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 10:16:25 -0600 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> Message-ID: <9EC90572-7E3F-4B07-9098-6CAFD7B3A4B0@gmail.com> I see you have both masked and unmasked augustus calls, so you may have a lot of non-masked predictions in your second run that are entirely contained in transposons and repeat regions (that is why they do not overlap). Really the easiest thing to do would be to open the results in a browser, find one of the ones listed as non-overlapping, and then look at it to see why it is not overlapping. You can then look at that specific location directly in the file as needed, but it will be much easier to interpret looking at the features drawn in a browser (like Apollo - desktop version). ?Carson > On Mar 10, 2017, at 3:53 AM, YannDussert wrote: > > Hi, > > Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round. > > I can't see any problem with it, it looks like this: > > Plvit001 augustus_masked match 66626 70338 0.85 + . ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 > Plvit001 augustus_masked match_part 66626 67586 0.85 + . ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961 > Plvit001 augustus match 66626 70338 1 + . ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1 > Plvit001 augustus match_part 66626 70096 1 + . ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471 > Plvit001 augustus_masked match_part 68166 68486 0.85 + . ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321 > Plvit001 augustus_masked match_part 69504 70096 0.85 + . ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593 > Plvit001 augustus_masked match_part 70174 70338 0.85 + . ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165 > > > Best regards, > > Yann > > On 09/03/2017 18:52, Carson Holt wrote: >> My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. >> >> ?Carson >> >> >>> On Mar 6, 2017, at 9:51 AM, YannDussert wrote: >>> >>> Hello, >>> >>> First, thank you for developing MAKER, this is a great annotation tool! >>> >>> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. >>> >>> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? >>> >>> Thank you in advance for your answer. >>> >>> Best regards, >>> Yann >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 10:17:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 10:17:58 -0600 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> Message-ID: Sure. Send me the file. On a side note, I find cufflinks results to be very noisy (lot?s of false positives). I usually get better results using assembled reads from Trinity (with -jaccard_clip option set), or using Stringtie. Thanks, Carson > On Mar 10, 2017, at 1:50 AM, Christoph Hahn wrote: > > Dear Carson, > > Thanks for getting in touch! I actually managed in the end. I converted the gtf I had from cufflinks to gff3 via the script 'gtf2gff.pl' from augustus and then used the script 'gffGetmRNA.pl' again from augustus to extract the mRNA in fasta. This file I fed to MAKER via the 'est=' route and now I get plenty of est2genome evidence in the maker result. So the problem seems to be limited to the route 'est_gff=', allthough there is no error message whatsoever the est2genome routine seems to never be triggered. > > I'd still be happy to upload my data (the cufflinks gff, the genome fasta, anything else?) if you want to try to reproduce the problem. Let me know! > > btw I seem to be unable to create a new topic or respond to topics via google groups. Is the list closed or the access restricted somehow. I only managed by responding to Jason's mail which I still had in my inbox directly via my gmail. > > Thanks! > > cheers, > Christoph > > On 09/03/2017 19:39, Carson Holt wrote: >> Jason never responded back to this one or uploaded his file to test. He probably figured it out off list. My guess is that your results are too fragmented to build a model that can pass filtering thresholds with. >> >> If you want I can take a look. You can upload all files for a test job here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> ?Carson >> >> >> >>> On Mar 7, 2017, at 5:51 PM, Christoph Hahn > wrote: >>> >>> Hi MAKER community, >>> >>> I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. >>> >>> I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! >>> >>> cheers, >>> Christoph >>> >>> On 10/02/2015 17:56, Carson Holt wrote: >>>> I ran a few est2genome runs with a cufflinks file i just generated and did not get any issues for EST based gene models. >>>> >>>> I?d like to at least have your test set to see if I can duplicate what you are seeing. >>>> >>>> Use this to upload the job files then I can just run it from my server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>> >>>> ?Carson >>>> >>>> >>>>> On Feb 3, 2015, at 11:13 AM, Jason Gallant > wrote: >>>>> >>>>> Hi Folks, >>>>> >>>>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been checking the output files, and have noticed that none of my RNAseq data was incorporated on the run. I used Cufflinks to perform alignments of libraries from several tissues, ran the accessory script cufflinks2gff3 for each tissue, then concatenated the resulting gff3 files. I even ran the accessory script gff3merge to check that the resulting file was properly formatted. >>>>> >>>>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only get protein2genome and repeatmasker evidence in my resulting maker gff3 file, and the genes predicted by these. Is there another option that I need to enable in order to use my est_gff file? I?m trying to get a set of genes to train the predictors for my next step. >>>>> >>>>> Any help would (as always) be greatly appreciated! >>>>> >>>>> Best, >>>>> Jason Gallant >>>>> >>>>> ? >>>>> Dr. Jason R. Gallant >>>>> Assistant Professor >>>>> Room 38 Natural Sciences >>>>> Department of Zoology >>>>> Michigan State University >>>>> East Lansing, MI 48824 >>>>> jgallant at msu.edu >>>>> office: 517-884-7756 >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Tue Mar 14 11:29:49 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Tue, 14 Mar 2017 10:29:49 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: setting up GFF3 output and fasta chunks doing blastn of ESTs running blast search. #--------- command -------------# Widget::blastn: /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah #-------------------------------# Error: NCBI C++ Exception: "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr Error: NCBI C++ Exception: "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr examining contents of the fasta file and run log ERROR: BLASTN failed --> rank=87, hostname=pnap-pe7-s09 ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:6537645 ERROR: BLASTN failed --> rank=88, hostname=pnap-pe7-s09 ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:6537659 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:6537645 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:6537659 On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt wrote: > Try reinstalling blast, or upgrade to a newer version of blast. > > ?Carson > > > On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: > > > I have maker running with MPI and I get this error over and over again for > every contig. Any Ideas? > > > MAKER WARNING: All old files will be erased before continuing > > #--------------------------------------------------------------------- > > Now starting the contig!! > > SeqID: 5239 > > Length: 1395 > > #--------------------------------------------------------------------- > > > > Error: NCBI C++ Exception: > > "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Criti > > > > *This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged, > including patient health information. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution or use of the contents of this message is strictly prohibited. > If you have received this message in error or are not the named recipient, > please notify us immediately by contacting the sender at the electronic > mail address noted above, and delete and destroy all copies of this > message. Thank you.* > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 11:36:07 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 11:36:07 -0600 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: The error itself is coming from BLAST. MAKER does provide the command used, so you can try it outside of MAKER. You can submit the files used as well as command used to the BLAST developers for them to test with. MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you can stop it from deleting files. Edit line 58 by setting CLEANUP => 0 Then you should be able to grab whatever files maker used to run blast, and copy the blast command used from STDERR. ?Carson > On Mar 14, 2017, at 11:29 AM, Marcus Naymik wrote: > > I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: > > setting up GFF3 output and fasta chunks > > doing blastn of ESTs > > running blast search. > > #--------- command -------------# > > Widget::blastn: > > /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah > > #-------------------------------# > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr > > > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr > > > > examining contents of the fasta file and run log > > ERROR: BLASTN failed > > --> rank=87, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537645 > > > > ERROR: BLASTN failed > > --> rank=88, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537659 > > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537645 > > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537659 > > > > > On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt > wrote: > Try reinstalling blast, or upgrade to a newer version of blast. > > ?Carson > > >> On Mar 2, 2017, at 1:05 PM, Marcus Naymik > wrote: >> >> >> I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? >> >> >> >> MAKER WARNING: All old files will be erased before continuing >> >> #--------------------------------------------------------------------- >> >> Now starting the contig!! >> >> SeqID: 5239 >> >> Length: 1395 >> >> #--------------------------------------------------------------------- >> >> >> >> >> >> Error: NCBI C++ Exception: >> >> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti >> >> >> >> >> >> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Mar 14 20:27:10 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Mar 2017 22:27:10 -0400 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <2017030519265949065818@cau.edu.cn> References: <2017030519265949065818@cau.edu.cn> Message-ID: <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> Hi Chao Chao, I?ve not run into this before. Could you post the RepeatModeler command you used? Thanks, Mike > On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: > > Dear sir: > Before the maker opeations, I do repeat masking first on my contigs. > However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. > > When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : > NOTE: RepeatScout did not return any models. > > Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? > > Hope to hear from you. > Best wishes! > > Chao Chao > 2017.03.05 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Wed Mar 15 08:26:15 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Wed, 15 Mar 2017 22:26:15 +0800 Subject: [maker-devel] How to get Pseudogene Message-ID: <2017031522261575294011@cau.edu.cn> Dear sir: I'd like to mask some pseudogene to my annotation. How can I do it? In the guide, the first step is "Run a tblastn of the protein sequence (query) vs. the intergenic genome sequence (subject/database)" My question is: What do the " protein sequence and the intergenic genome sequence " refer to seperately? My own protein database? How to use the result in maker annotation? Best wishes! Chao Chao 2017.03.15 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Mar 15 09:00:13 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 15 Mar 2017 11:00:13 -0400 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <201703152048212561203@cau.edu.cn> References: <2017030519265949065818@cau.edu.cn> <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> <201703152048212561203@cau.edu.cn> Message-ID: <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> Hi Chao Chao, I?m not sure how to trouble shoot this if there were no error messages. I?ve ccd a couple of people that have worked with this protocol much more than I have. Ning and Kevin, Do you have any tips for running these tools that may help Chao Chao? Thanks, Mike > On Mar 15, 2017, at 8:48 AM, dcg at cau.edu.cn wrote: > > Thank for your reply! > I just followed the guide iat http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced > > To use LTRHarvest, my command is as below(the filename was set for my favor) > DIR1/gt suffixerator -db seqfile -indexname seqfileindex -tis -suf -lcp -des -ssp ?dna > DIR1/gt ltrharvest -index seqfileindex -out seqfile.out99 -outinner seqfile.outinner99 -gff3 seqfile.gff99 -minlenltr 100 \ > -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > seqfile.result99 > No error, but no results as well > > Chao Chao > 2017.03.15 > > From: Michael Campbell > Date: 2017-03-15 10:27 > To: dcg > CC: maker-devel > Subject: Re: [maker-devel] For help about masking repeats before annotation > Hi Chao Chao, > > I?ve not run into this before. Could you post the RepeatModeler command you used? > > Thanks, > Mike >> On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: >> >> Dear sir: >> Before the maker opeations, I do repeat masking first on my contigs. >> However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. >> >> When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : >> NOTE: RepeatScout did not return any models. >> >> Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? >> >> Hope to hear from you. >> Best wishes! >> >> Chao Chao >> 2017.03.05 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Wed Mar 15 10:54:48 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Wed, 15 Mar 2017 09:54:48 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: Thanks, you're right. I had to recompile blast from src with this flag: -std=c++0x On Tue, Mar 14, 2017 at 10:36 AM, Carson Holt wrote: > The error itself is coming from BLAST. MAKER does provide the command > used, so you can try it outside of MAKER. You can submit the files used as > well as command used to the BLAST developers for them to test with. > > MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you > can stop it from deleting files. > > Edit line 58 by setting CLEANUP => 0 > > Then you should be able to grab whatever files maker used to run blast, > and copy the blast command used from STDERR. > > ?Carson > > > > On Mar 14, 2017, at 11:29 AM, Marcus Naymik wrote: > > I have now tried with multiple versions of blast (2.6 and 2.28 binaries > and built from source) and get the same error: > > setting up GFF3 output and fasta chunks > > doing blastn of ESTs > > running blast search. > > #--------- command -------------# > > Widget::blastn: > > /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db > /scratch/mnaymik/maker/tmp/maker_cah > > #-------------------------------# > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Cr > > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Cr > > > examining contents of the fasta file and run log > > ERROR: BLASTN failed > > --> rank=87, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537645 > > > ERROR: BLASTN failed > > --> rank=88, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537659 > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537645 > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537659 > > > > On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt wrote: > >> Try reinstalling blast, or upgrade to a newer version of blast. >> >> ?Carson >> >> >> On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: >> >> >> I have maker running with MPI and I get this error over and over again >> for every contig. Any Ideas? >> >> >> MAKER WARNING: All old files will be erased before continuing >> >> #--------------------------------------------------------------------- >> >> Now starting the contig!! >> >> SeqID: 5239 >> >> Length: 1395 >> >> #--------------------------------------------------------------------- >> >> >> >> Error: NCBI C++ Exception: >> >> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", >> line 925: Criti >> >> >> >> *This electronic message is intended to be for the use only of the named >> recipient, and may contain information that is confidential or privileged, >> including patient health information. If you are not the intended >> recipient, you are hereby notified that any disclosure, copying, >> distribution or use of the contents of this message is strictly prohibited. >> If you have received this message in error or are not the named recipient, >> please notify us immediately by contacting the sender at the electronic >> mail address noted above, and delete and destroy all copies of this >> message. Thank you.* >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > *This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged, > including patient health information. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution or use of the contents of this message is strictly prohibited. > If you have received this message in error or are not the named recipient, > please notify us immediately by contacting the sender at the electronic > mail address noted above, and delete and destroy all copies of this > message. Thank you.* > > > -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 15 11:00:18 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 15 Mar 2017 11:00:18 -0600 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: <6A6C819F-D903-401A-8522-29FEBC955F17@gmail.com> Glad I could help. Remember to switch back CLEANUP => 1 if you set it to 0 to debug. Otherwise you will have a lot of files left in /tmp after each MAKER run. ?Carson > On Mar 15, 2017, at 10:54 AM, Marcus Naymik wrote: > > Thanks, you're right. I had to recompile blast from src with this flag: -std=c++0x > > On Tue, Mar 14, 2017 at 10:36 AM, Carson Holt > wrote: > The error itself is coming from BLAST. MAKER does provide the command used, so you can try it outside of MAKER. You can submit the files used as well as command used to the BLAST developers for them to test with. > > MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you can stop it from deleting files. > > Edit line 58 by setting CLEANUP => 0 > > Then you should be able to grab whatever files maker used to run blast, and copy the blast command used from STDERR. > > ?Carson > > > >> On Mar 14, 2017, at 11:29 AM, Marcus Naymik > wrote: >> >> I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: >> >> setting up GFF3 output and fasta chunks >> >> doing blastn of ESTs >> >> running blast search. >> >> #--------- command -------------# >> >> Widget::blastn: >> >> /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah >> >> #-------------------------------# >> >> Error: NCBI C++ Exception: >> >> "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr >> >> >> >> Error: NCBI C++ Exception: >> >> "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr >> >> >> >> examining contents of the fasta file and run log >> >> ERROR: BLASTN failed >> >> --> rank=87, hostname=pnap-pe7-s09 >> >> ERROR: Failed while doing blastn of ESTs >> >> ERROR: Chunk failed at level:0, tier_type:3 >> >> FAILED CONTIG:6537645 >> >> >> >> ERROR: BLASTN failed >> >> --> rank=88, hostname=pnap-pe7-s09 >> >> ERROR: Failed while doing blastn of ESTs >> >> ERROR: Chunk failed at level:0, tier_type:3 >> >> FAILED CONTIG:6537659 >> >> >> >> ERROR: Chunk failed at level:4, tier_type:0 >> >> FAILED CONTIG:6537645 >> >> >> >> ERROR: Chunk failed at level:4, tier_type:0 >> >> FAILED CONTIG:6537659 >> >> >> >> >> On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt > wrote: >> Try reinstalling blast, or upgrade to a newer version of blast. >> >> ?Carson >> >> >>> On Mar 2, 2017, at 1:05 PM, Marcus Naymik > wrote: >>> >>> >>> I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? >>> >>> >>> >>> MAKER WARNING: All old files will be erased before continuing >>> >>> #--------------------------------------------------------------------- >>> >>> Now starting the contig!! >>> >>> SeqID: 5239 >>> >>> Length: 1395 >>> >>> #--------------------------------------------------------------------- >>> >>> >>> >>> >>> >>> Error: NCBI C++ Exception: >>> >>> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti >>> >>> >>> >>> >>> >>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >> > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangn at msu.edu Wed Mar 15 09:56:30 2017 From: jiangn at msu.edu (Jiang, Ning) Date: Wed, 15 Mar 2017 15:56:30 +0000 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> References: <2017030519265949065818@cau.edu.cn> <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> <201703152048212561203@cau.edu.cn>, <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> Message-ID: Hi Chao Chao, I guess you have an extra "\" in your second command. We put that sign there to indicate the entire thing belong to one command (it is too long to put in one row). I suggest you remove the "\" and try again. Good luck! Ning Jiang ________________________________ From: Michael Campbell Sent: Wednesday, March 15, 2017 11:00:13 AM To: dcg at cau.edu.cn Cc: maker-devel; Jiang, Ning; Kevin Childs Subject: Re: [maker-devel] For help about masking repeats before annotation Hi Chao Chao, I?m not sure how to trouble shoot this if there were no error messages. I?ve ccd a couple of people that have worked with this protocol much more than I have. Ning and Kevin, Do you have any tips for running these tools that may help Chao Chao? Thanks, Mike On Mar 15, 2017, at 8:48 AM, dcg at cau.edu.cn wrote: Thank for your reply! I just followed the guide iat http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced To use LTRHarvest, my command is as below(the filename was set for my favor) DIR1/gt suffixerator -db seqfile -indexname seqfileindex -tis -suf -lcp -des -ssp ?dna DIR1/gt ltrharvest -index seqfileindex -out seqfile.out99 -outinner seqfile.outinner99 -gff3 seqfile.gff99 -minlenltr 100 \ -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > seqfile.result99 No error, but no results as well Chao Chao ________________________________ 2017.03.15 From: Michael Campbell Date: 2017-03-15 10:27 To: dcg CC: maker-devel Subject: Re: [maker-devel] For help about masking repeats before annotation Hi Chao Chao, I?ve not run into this before. Could you post the RepeatModeler command you used? Thanks, Mike On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: Dear sir: Before the maker opeations, I do repeat masking first on my contigs. However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : NOTE: RepeatScout did not return any models. Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? Hope to hear from you. Best wishes! Chao Chao ________________________________ 2017.03.05 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 16 09:19:02 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 Mar 2017 09:19:02 -0600 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> Message-ID: <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied. Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. So the final result is not a superset, rather a merged subset from each potential source. EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn?t work it produces either no model or partial models. ?Carson > On Mar 16, 2017, at 3:07 AM, Ray Cui wrote: > > Dear Carson, > > thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set? > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt > wrote: > Maybe. I haven?t tested this, but it should work. Maker supports labels for input by placing a ?:? and a label after each file name. > > Example?> > est=file1.fasta:label_1,file2.fasta:label_2 > > If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2. > > As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3?> > evmtrans:est2genome:label1=10 > > I don?t know if the label will force anything raw analysis to rerun, but it shouldn?t. > > > ?Carson > > > >> On Mar 15, 2017, at 5:13 AM, Ray Cui > wrote: >> >> Hi Carson, >> >> currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa? >> >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui > wrote: >> Dear Carson, >> >> thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see. >> >> It is good to know that MAKER will reuse the old results. >> >> Best Regards, >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt > wrote: >> You can find lots of info in the devel archives on training. Example ?> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >> >> Also example of training SNAP on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors >> >> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues. >> >> Thanks, >> Carson >> >> >>> On Mar 14, 2017, at 10:44 AM, Ray Cui > wrote: >>> >>> Hi Carson, >>> Thanks for your prompt response! >>> >>> I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? >>> After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui > wrote: >>> I see. If my evm config looks like this: >>> evmab=5 #default weight for source unspecified ab initio predictions >>> evmab:snap=5 #weight for snap sourced predictions >>> evmab:augustus=10 #weight for augustus sourced predictions >>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>> evmab:genemark=5 #weight for genemark sourced predictions >>> >>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct? >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt > wrote: >>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source. >>> >>> ?Carson >>> >>>> On Mar 14, 2017, at 10:26 AM, Ray Cui > wrote: >>>> >>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now. >>>> >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt > wrote: >>>> >>>> These are set in the maker_evm.ctl file. >>>> >>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this ?> >>>> evmab:GENEMARK=7 >>>> >>>> This also works ?> >>>> evmab:pred_gff:GENEMARK=7 >>>> >>>> Or just set the default ?> >>>> evmab=7 >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui > wrote: >>>>> >>>>> Dear Carson, >>>>> >>>>> I think it may be the most straight foward to input the GFF3 instead. >>>>> >>>>> What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option? >>>>> >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 >>>>> Mobile: +49 0221 37970 496 <> >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt > wrote: >>>>> It may work as is as long as you don?t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff. >>>>> >>>>> ?Carson >>>>> >>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui > wrote: >>>>>> >>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>> >>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>> >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 >>>>>> Mobile: +49 0221 37970 496 <> >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt > wrote: >>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts. >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui > wrote: >>>>>>> >>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>> >>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage. >>>>>>> >>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl ", with the following command lines options: >>>>>>> >>>>>>> Usage: /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] --sequence [filename] >>>>>>> >>>>>>> GeneMark-ES Suite version 4.33 >>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction >>>>>>> >>>>>>> Input sequence/s should be in FASTA format >>>>>>> >>>>>>> Algorithm options >>>>>>> --ES to run self-training >>>>>>> --fungus to run algorithm with branch point model (most useful for fungal genomes) >>>>>>> --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) >>>>>>> --et_score [number]; 4 (default) minimum score of intron in initiation of the ET algorithm >>>>>>> --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome >>>>>>> --training_only to run only training step >>>>>>> --prediction_only to run only prediction step >>>>>>> --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) >>>>>>> >>>>>>> Sequence pre-processing options >>>>>>> --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig >>>>>>> --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training >>>>>>> --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap >>>>>>> Letters 'n' and 'N' are interpreted as standing within gaps >>>>>>> --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask >>>>>>> Letters 'x' and 'X' are interpreted as results of hard masking of repeats >>>>>>> --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length >>>>>>> >>>>>>> Run options >>>>>>> --cores [number]; 1 (default) to run program with multiple threads >>>>>>> --pbs to run on cluster with PBS support >>>>>>> --v verbose >>>>>>> >>>>>>> Customizing parameters: >>>>>>> --max_intron [number]; default 10000 (3000 fungi), maximum length of intron >>>>>>> --max_intergenic [number]; default 10000, maximum length of intergenic regions >>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step >>>>>>> >>>>>>> Developer options: >>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>> --ini_mod [filename]; use this file with parameters for algorithm initiation >>>>>>> --test_set [filename]; to evaluate prediction accuracy on the given test set >>>>>>> --key_bin >>>>>>> --debug >>>>>>> # ------------------- >>>>>>> >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 >>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt > wrote: >>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui > wrote: >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark? >>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Ray >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 >>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt > wrote: >>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui > wrote: >>>>>>>>> >>>>>>>>> Dear Carson, >>>>>>>>> >>>>>>>>> I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the --prediction and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps) >>>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt > wrote: >>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter. >>>>>>>>> >>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don?t know how this compares to GeneMark-ET. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui > wrote: >>>>>>>>>> >>>>>>>>>> Hi Daniel, >>>>>>>>>> >>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? >>>>>>>>>> >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel > wrote: >>>>>>>>>> Hi Ray, >>>>>>>>>> >>>>>>>>>> I think you?re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path to the ?es.mod" file made by Genemark. >>>>>>>>>> >>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. >>>>>>>>>> >>>>>>>>>> ~Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui > wrote: >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor. >>>>>>>>>>> When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>> >>>>>>>>>>> And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcui at age.mpg.de Thu Mar 16 10:02:08 2017 From: rcui at age.mpg.de (Ray Cui) Date: Thu, 16 Mar 2017 17:02:08 +0100 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Message-ID: Dear Carson, thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster. Best Regards, Ray Dr. Rongfeng (Ray) Cui Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing Wissenschaftlicher MA / Postdoctoral researcher Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne Tel.:+49 (0)221 496 Mobile: +49 0221 37970 496 rcui at age.mpg.de www.age.mpg.de On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt wrote: > Final results with source maker will be of type gene/mRNA/exon/CDS. They > have been further processed beyond the raw results, and may include > extensions such as the addition of UTR for example (or hint based > recomputation in the case of SNAP and Augustus). The gene ID of the maker > model will let you know the source before additional processing was > applied. Raw results will also be in the file as type match/match_part and > source evm/snap/augustus, but are only there for reference purposes (there > will also be a raw fasta from each source, but only for reference > purposes). All models compete against each other, and the one best matching > the evidence is kept. So if SNAP or Augustus scores better than EVM, then > that model will be kept for that locus. You can find more detail in the > MAKER wiki and the MAKER2 paper for how models compete. > > So the final result is not a superset, rather a merged subset from each > potential source. > > EVM is not used to obtain a consensus gene model. Its results compete just > like all other algorithms. This is because when EVM works it produces > beautiful models that score really well, but when it doesn?t work it > produces either no model or partial models. > > ?Carson > > > On Mar 16, 2017, at 3:07 AM, Ray Cui wrote: > > Dear Carson, > > thank you so much! I am now peeking into the results for the > finished scaffolds. In the gff file, the gene id confuses me a bit. In this > file, column 2 is always "maker", but the "ID" attribute in the annotation > is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean > the final annotation is a superset of all gene predictors? If EVM was used > to obtain a consensus gene model, why would the other models still show up > in the final result set? > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for > Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 <+49%20221%20496> > Mobile: +49 0221 37970 496 > rcui at age.mpg.de > www.age.mpg.de > > > > On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt wrote: > >> Maybe. I haven?t tested this, but it should work. Maker supports labels >> for input by placing a ?:? and a label after each file name. >> >> Example?> >> est=file1.fasta:label_1,file2.fasta:label_2 >> >> If you label your files, then the label will go into the GFF3. So instead >> of est2genome in column 2, you will get est2genome:label_1 in column 2. >> >> As a result, you should be able to add that label to the EVM settings >> like so and it will match column 2 of the GFF3?> >> evmtrans:est2genome:label1=10 >> >> I don?t know if the label will force anything raw analysis to rerun, but >> it shouldn?t. >> >> >> ?Carson >> >> >> >> On Mar 15, 2017, at 5:13 AM, Ray Cui wrote: >> >> Hi Carson, >> >> currently I am partitioning the protein evidence based on >> phylogenetic relationship into several datasets, supplied as comma >> delimited list. Is it possible then to specify higher weight for >> protein2genome models from closer related species than further related taxa? >> >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >> Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 <+49%20221%20496> >> Mobile: +49 0221 37970 496 >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui wrote: >> >>> Dear Carson, >>> >>> thank you for the pointers! Before running the first round of >>> Maker, I mapped conspecific Trinity assembled proteins (long, "full length" >>> subset) to an earlier version of the genome assembly using my own pipeline >>> and trained Augustus and SNAP that way. I also trained Genemark-ET using >>> TopHat alignments per their instructions. I'm wondering if it will be worth >>> doing a second round, but I guess I will see. >>> >>> It is good to know that MAKER will reuse the old results. >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >>> Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 <+49%20221%20496> >>> Mobile: +49 0221 37970 496 >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt wrote: >>> >>>> You can find lots of info in the devel archives on training. Example ?> >>>> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >>>> >>>> Also example of training SNAP on the wiki ?> >>>> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/M >>>> AKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_init >>>> io_Gene_Predictors >>>> >>>> MAKER will reuse old raw results if you rerun in the same directory >>>> (only deleting what would be different given altered settings between >>>> runs). It will see the existing alignments archived in the datastore as raw >>>> reports and just reuse them. The exception to this are the exonerate >>>> alignments. They are generated relatively quickly compared to the BLAS T >>>> runs, so rerunning them is not too much overhead. Also they are not >>>> archived because doing so created IO issues (exonerate is not running in >>>> bulk batches like BLAST, rather as multiple small separate runs for each >>>> polished read, and archiving a lot of small raw reports can occur so fast >>>> when using MPI that it crashes storage servers). So we decided to just not >>>> archive exonerate rather than develop a database like bundling/compression >>>> mechanism to get around the IO issues. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On Mar 14, 2017, at 10:44 AM, Ray Cui wrote: >>>> >>>> Hi Carson, >>>> Thanks for your prompt response! >>>> >>>> I have a somewhat unrelated question. After the first run of >>>> Maker, I want to train Augustus, SNAP and Genemark-ET using the most >>>> reliable gene models produced in the first round. What would be a good way >>>> to select these gene models? >>>> After retraining the ab initio predictors, I also wonder if >>>> it's necessary to redo all the alignments (blastx, est2genome, >>>> protein2genome etc) in the second iteration, since they are exactly the >>>> same as the first run. Perhaps maker can take in the alignment results from >>>> the previous run? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >>>> Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>> Mobile: +49 0221 37970 496 >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui wrote: >>>> >>>>> I see. If my evm config looks like this: >>>>> evmab=5 #default weight for source unspecified ab initio predictions >>>>> evmab:snap=5 #weight for snap sourced predictions >>>>> evmab:augustus=10 #weight for augustus sourced predictions >>>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>>>> evmab:genemark=5 #weight for genemark sourced predictions >>>>> >>>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value >>>>> from "evmab" (=5) will be used, is that correct? >>>>> >>>>> Best Regards, >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>> for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>> Mobile: +49 0221 37970 496 >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt >>>>> wrote: >>>>> >>>>>> Column 2 in the GFF3 file is the source column. It is used to specify >>>>>> the source fo the data. That column will also be used by EVM to bin >>>>>> features by their source and apply weights based on source. >>>>>> >>>>>> ?Carson >>>>>> >>>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui wrote: >>>>>> >>>>>> Thanks! I didn't know you can also name the gff, but I think using >>>>>> the default is fine, that's what I'm doing now. >>>>>> >>>>>> >>>>>> Best Regards, >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>> for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>> Mobile: +49 0221 37970 496 >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> These are set in the maker_evm.ctl file. >>>>>>> >>>>>>> Use whatever you used in the source column of the input GFF3. For >>>>>>> example if column 2 is set as GENEMARK, then do this ?> >>>>>>> evmab:GENEMARK=7 >>>>>>> >>>>>>> This also works ?> >>>>>>> evmab:pred_gff:GENEMARK=7 >>>>>>> >>>>>>> Or just set the default ?> >>>>>>> evmab=7 >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui wrote: >>>>>>> >>>>>>> Dear Carson, >>>>>>> >>>>>>> I think it may be the most straight foward to input the GFF3 >>>>>>> instead. >>>>>>> >>>>>>> What is the correct way of setting a weight for the EVM step >>>>>>> for this GFF3 models passed through the pred_gff option? >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>>> for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>> Mobile: +49 0221 37970 496 >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt >>>>>>> wrote: >>>>>>> >>>>>>>> It may work as is as long as you don?t need any of the additional >>>>>>>> options that have been added. If not, you can also just run it outside of >>>>>>>> MAKER then provide the result in GFF3 format to pred_gff. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui wrote: >>>>>>>> >>>>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>>>> >>>>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>>>> >>>>>>>> Ray >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>>>> for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET >>>>>>>>> and GeneMark-EP into GeneMark-ES scripts. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui wrote: >>>>>>>>> >>>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>>>> >>>>>>>>> I think there must have been a big update between different >>>>>>>>> Genemark versions. It seems that they now also supports evidence being fed >>>>>>>>> into the prediction stage. >>>>>>>>> >>>>>>>>> The name of the latest version of the genemark script has been >>>>>>>>> changed to "gmes_petap.pl", with the following command lines >>>>>>>>> options: >>>>>>>>> >>>>>>>>> Usage: /beegfs/group_dv/software/sou >>>>>>>>> rce/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] >>>>>>>>> --sequence [filename] >>>>>>>>> >>>>>>>>> GeneMark-ES Suite version 4.33 >>>>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) >>>>>>>>> based training and prediction >>>>>>>>> >>>>>>>>> Input sequence/s should be in FASTA format >>>>>>>>> >>>>>>>>> Algorithm options >>>>>>>>> --ES to run self-training >>>>>>>>> --fungus to run algorithm with branch point model (most >>>>>>>>> useful for fungal genomes) >>>>>>>>> --ET [filename]; to run training with introns >>>>>>>>> coordinates from RNA-Seq read alignments (GFF format) >>>>>>>>> --et_score [number]; 4 (default) minimum score of intron in >>>>>>>>> initiation of the ET algorithm >>>>>>>>> --evidence [filename]; to use in prediction external >>>>>>>>> evidence (RNA or protein) mapped to genome >>>>>>>>> --training_only to run only training step >>>>>>>>> --prediction_only to run only prediction step >>>>>>>>> --predict_with [filename]; predict genes using this file species >>>>>>>>> specific parameters (bypass regular training and prediction steps) >>>>>>>>> >>>>>>>>> Sequence pre-processing options >>>>>>>>> --max_contig [number]; 5000000 (default) will split input >>>>>>>>> genomic sequence into contigs shorter then max_contig >>>>>>>>> --min_contig [number]; 50000 (default); will ignore contigs >>>>>>>>> shorter then min_contig in training >>>>>>>>> --max_gap [number]; 5000 (default); will split sequence at >>>>>>>>> gaps longer than max_gap >>>>>>>>> Letters 'n' and 'N' are interpreted as standing >>>>>>>>> within gaps >>>>>>>>> --max_mask [number]; 5000 (default); will split sequence at >>>>>>>>> repeats longer then max_mask >>>>>>>>> Letters 'x' and 'X' are interpreted as results of >>>>>>>>> hard masking of repeats >>>>>>>>> --soft_mask [number] to indicate that lowercase letters stand >>>>>>>>> for repeats; utilize only lowercase repeats longer than specified length >>>>>>>>> >>>>>>>>> Run options >>>>>>>>> --cores [number]; 1 (default) to run program with >>>>>>>>> multiple threads >>>>>>>>> --pbs to run on cluster with PBS support >>>>>>>>> --v verbose >>>>>>>>> >>>>>>>>> Customizing parameters: >>>>>>>>> --max_intron [number]; default 10000 (3000 fungi), >>>>>>>>> maximum length of intron >>>>>>>>> --max_intergenic [number]; default 10000, maximum length of >>>>>>>>> intergenic regions >>>>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum >>>>>>>>> allowed gene length in prediction step >>>>>>>>> >>>>>>>>> Developer options: >>>>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>>>> --ini_mod [filename]; use this file with parameters for >>>>>>>>> algorithm initiation >>>>>>>>> --test_set [filename]; to evaluate prediction accuracy on >>>>>>>>> the given test set >>>>>>>>> --key_bin >>>>>>>>> --debug >>>>>>>>> # ------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>> Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Also note that the gmhmme3 executable distributed with different >>>>>>>>>> flavors of genemark has had the same name but has been quite different in >>>>>>>>>> both command line structure and output between flavors. >>>>>>>>>> >>>>>>>>>> ?Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui wrote: >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters >>>>>>>>>> automatically set by Maker when calling Genemark? >>>>>>>>>> If you can point me to the part of the maker source code that >>>>>>>>>> construct the final genemark command line I can also take a look. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt >>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. >>>>>>>>>>> It depends on if formatting or any flags have changed between versions. >>>>>>>>>>> >>>>>>>>>>> ?Carson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui wrote: >>>>>>>>>>> >>>>>>>>>>> Dear Carson, >>>>>>>>>>> >>>>>>>>>>> I have now run GeneMark-ET, and it produces a trained >>>>>>>>>>> .mod file. I think it can be then passed to Maker. Do you know what is the >>>>>>>>>>> final constructed command line in Maker that calls genemark? Genemark-et >>>>>>>>>>> and es use the same perl script so one probably only needs to use the >>>>>>>>>>> --prediction and --predict_with xxx.mod options to predict genes using >>>>>>>>>>> the species specific parameters (bypassing regular training and prediction >>>>>>>>>>> steps) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may >>>>>>>>>>>> not work with GeneMark-ET. So any MAKER related archive posts etc. will be >>>>>>>>>>>> related to the latter. >>>>>>>>>>>> >>>>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let >>>>>>>>>>>> it run. It would then produce several files and output directories. The >>>>>>>>>>>> es.mod file was the one you provided to MAKER. I don?t know how this >>>>>>>>>>>> compares to GeneMark-ET. >>>>>>>>>>>> >>>>>>>>>>>> ?Carson >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Daniel, >>>>>>>>>>>> >>>>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" >>>>>>>>>>>> flag, is that the flag I should use when training or should I just let >>>>>>>>>>>> Genemark also perform the prediction? >>>>>>>>>>>> >>>>>>>>>>>> Ray >>>>>>>>>>>> >>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ray, >>>>>>>>>>>>> >>>>>>>>>>>>> I think you?re on the right track with training Genemark with >>>>>>>>>>>>> RNAseq data. It should only change the training steps, which are external >>>>>>>>>>>>> to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path >>>>>>>>>>>>> to the ?es.mod" file made by Genemark. >>>>>>>>>>>>> >>>>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a >>>>>>>>>>>>> control file for EVM, in which you set your weights for the various inputs, >>>>>>>>>>>>> and then MAKER runs EVM alongside all the other gene predictors and chooses >>>>>>>>>>>>> the model that is best supported by the evidence. >>>>>>>>>>>>> >>>>>>>>>>>>> ~Daniel >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> I have sucessfully installed Maker beta 3, working >>>>>>>>>>>>> with both Augustus and SNAP. I also want to try adding GeneMark-ES to the >>>>>>>>>>>>> ab initio predictor. >>>>>>>>>>>>> When I read the GeneMark-ES manual, it says that one >>>>>>>>>>>>> can use RNAseq data to aid training. I'm wondering what would be the best >>>>>>>>>>>>> way to integrate Genemark-ET predictions into Maker. Should I run >>>>>>>>>>>>> Genemark-ET independent of Maker, then integrate the GFF at some point >>>>>>>>>>>>> during the maker process? If so, how should I edit the configuration file? >>>>>>>>>>>>> Currently maker has an option called "gmhmm". Should I then train GeneMark >>>>>>>>>>>>> by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>>>> >>>>>>>>>>>>> And perhaps an unrelated question is that now Maker >>>>>>>>>>>>> beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, >>>>>>>>>>>>> what does it do), and how does it differ from what Maker is designed for >>>>>>>>>>>>> (both reconciles different gene models). >>>>>>>>>>>>> >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Ray >>>>>>>>>>>>> >>>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>>>>>>>>>>>> ell-lab.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>>>>>>>>>>> ell-lab.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 16 11:30:16 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 Mar 2017 11:30:16 -0600 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Message-ID: 1. Verify that the issue is not being caused by hints from evidence (i.e. that you aren?t feeding fused mRNA-seq assemblies or protein evidence). Fused evidence will result in hints that fuse models. 2. If it still have an issue, then drop SNAP. Not all predictors work well on all genomes. Also no one can post to the google group. It?s just for archival. All message have to go to the mailing list here, and they then get archived on google ?> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org The mailing list logs shows that you requested to unsubscribed earlier today. ?Carson > On Mar 16, 2017, at 11:22 AM, Ray Cui wrote: > > Hi Carson, > > due to some reason I can't seem to post anymore on the google group. > > After looking at the results, it appears that SNAP performs poorly compared to genemark-ET and augustus. It looks like it's very prone to fusing neighboring genes and getting false positives. Is that a general thing you see in vertebrate genomes with SNAP? I saw that you didn't recommend SNAP for primates, perhaps the issue is similar? > > Attached you can see a screen shot of IGV browser, with all evidence tracks separated. > > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Thu, Mar 16, 2017 at 5:02 PM, Ray Cui > wrote: > Dear Carson, > > thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster. > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt > wrote: > Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied. Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. > > So the final result is not a superset, rather a merged subset from each potential source. > > EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn?t work it produces either no model or partial models. > > ?Carson > > >> On Mar 16, 2017, at 3:07 AM, Ray Cui > wrote: >> >> Dear Carson, >> >> thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set? >> >> Best Regards, >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt > wrote: >> Maybe. I haven?t tested this, but it should work. Maker supports labels for input by placing a ?:? and a label after each file name. >> >> Example?> >> est=file1.fasta:label_1,file2.fasta:label_2 >> >> If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2. >> >> As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3?> >> evmtrans:est2genome:label1=10 >> >> I don?t know if the label will force anything raw analysis to rerun, but it shouldn?t. >> >> >> ?Carson >> >> >> >>> On Mar 15, 2017, at 5:13 AM, Ray Cui > wrote: >>> >>> Hi Carson, >>> >>> currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa? >>> >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui > wrote: >>> Dear Carson, >>> >>> thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see. >>> >>> It is good to know that MAKER will reuse the old results. >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt > wrote: >>> You can find lots of info in the devel archives on training. Example ?> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >>> >>> Also example of training SNAP on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors >>> >>> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues. >>> >>> Thanks, >>> Carson >>> >>> >>>> On Mar 14, 2017, at 10:44 AM, Ray Cui > wrote: >>>> >>>> Hi Carson, >>>> Thanks for your prompt response! >>>> >>>> I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? >>>> After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui > wrote: >>>> I see. If my evm config looks like this: >>>> evmab=5 #default weight for source unspecified ab initio predictions >>>> evmab:snap=5 #weight for snap sourced predictions >>>> evmab:augustus=10 #weight for augustus sourced predictions >>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>>> evmab:genemark=5 #weight for genemark sourced predictions >>>> >>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt > wrote: >>>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source. >>>> >>>> ?Carson >>>> >>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui > wrote: >>>>> >>>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now. >>>>> >>>>> >>>>> Best Regards, >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 >>>>> Mobile: +49 0221 37970 496 <> >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt > wrote: >>>>> >>>>> These are set in the maker_evm.ctl file. >>>>> >>>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this ?> >>>>> evmab:GENEMARK=7 >>>>> >>>>> This also works ?> >>>>> evmab:pred_gff:GENEMARK=7 >>>>> >>>>> Or just set the default ?> >>>>> evmab=7 >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui > wrote: >>>>>> >>>>>> Dear Carson, >>>>>> >>>>>> I think it may be the most straight foward to input the GFF3 instead. >>>>>> >>>>>> What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option? >>>>>> >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 >>>>>> Mobile: +49 0221 37970 496 <> >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt > wrote: >>>>>> It may work as is as long as you don?t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff. >>>>>> >>>>>> ?Carson >>>>>> >>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui > wrote: >>>>>>> >>>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>>> >>>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 >>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt > wrote: >>>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui > wrote: >>>>>>>> >>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>>> >>>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage. >>>>>>>> >>>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl ", with the following command lines options: >>>>>>>> >>>>>>>> Usage: /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] --sequence [filename] >>>>>>>> >>>>>>>> GeneMark-ES Suite version 4.33 >>>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction >>>>>>>> >>>>>>>> Input sequence/s should be in FASTA format >>>>>>>> >>>>>>>> Algorithm options >>>>>>>> --ES to run self-training >>>>>>>> --fungus to run algorithm with branch point model (most useful for fungal genomes) >>>>>>>> --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) >>>>>>>> --et_score [number]; 4 (default) minimum score of intron in initiation of the ET algorithm >>>>>>>> --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome >>>>>>>> --training_only to run only training step >>>>>>>> --prediction_only to run only prediction step >>>>>>>> --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) >>>>>>>> >>>>>>>> Sequence pre-processing options >>>>>>>> --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig >>>>>>>> --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training >>>>>>>> --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap >>>>>>>> Letters 'n' and 'N' are interpreted as standing within gaps >>>>>>>> --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask >>>>>>>> Letters 'x' and 'X' are interpreted as results of hard masking of repeats >>>>>>>> --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length >>>>>>>> >>>>>>>> Run options >>>>>>>> --cores [number]; 1 (default) to run program with multiple threads >>>>>>>> --pbs to run on cluster with PBS support >>>>>>>> --v verbose >>>>>>>> >>>>>>>> Customizing parameters: >>>>>>>> --max_intron [number]; default 10000 (3000 fungi), maximum length of intron >>>>>>>> --max_intergenic [number]; default 10000, maximum length of intergenic regions >>>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step >>>>>>>> >>>>>>>> Developer options: >>>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>>> --ini_mod [filename]; use this file with parameters for algorithm initiation >>>>>>>> --test_set [filename]; to evaluate prediction accuracy on the given test set >>>>>>>> --key_bin >>>>>>>> --debug >>>>>>>> # ------------------- >>>>>>>> >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 >>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt > wrote: >>>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui > wrote: >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark? >>>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt > wrote: >>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui > wrote: >>>>>>>>>> >>>>>>>>>> Dear Carson, >>>>>>>>>> >>>>>>>>>> I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the --prediction and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt > wrote: >>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter. >>>>>>>>>> >>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don?t know how this compares to GeneMark-ET. >>>>>>>>>> >>>>>>>>>> ?Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Daniel, >>>>>>>>>>> >>>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? >>>>>>>>>>> >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel > wrote: >>>>>>>>>>> Hi Ray, >>>>>>>>>>> >>>>>>>>>>> I think you?re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path to the ?es.mod" file made by Genemark. >>>>>>>>>>> >>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. >>>>>>>>>>> >>>>>>>>>>> ~Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor. >>>>>>>>>>>> When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>>> >>>>>>>>>>>> And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Ray >>>>>>>>>>>> >>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Mar 16 21:48:10 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Mar 2017 23:48:10 -0400 Subject: [maker-devel] split genes Message-ID: Hello: If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 17 09:21:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Mar 2017 09:21:10 -0600 Subject: [maker-devel] split genes In-Reply-To: References: Message-ID: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. ?Carson > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang wrote: > > Hello: > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Fri Mar 17 11:49:06 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 17 Mar 2017 13:49:06 -0400 Subject: [maker-devel] split genes In-Reply-To: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. Best Quanwei 2017-03-17 11:21 GMT-04:00 Carson Holt : > MAKER will not try and predict a gene across contigs because it it too > difficult to determine contig order. If you are able to determine order, > then it is best to merge the contigs into a single scaffold before > annotating rather than try and produce split models in GFF3. > > ?Carson > > > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > If one gene was covered by two contigs, sometimes we may predicted two > genes. I wonder how Maker deal with such conditions? > > Even Maker tried to reduce such cases, they can not be completely avoid. > So I wonder whether there is any way or any tool to find such split genes > (one gene split into two contigs and predicted as two genes)? > > > > As we know, we can also provide protein sequences and transcript > assembly as evidences. Can a protein sequence or transcript assembly rescue > the split genes in Maker pipe line? For example, if one transcript cover > 40% of predicted genes predicted in two contigs, then merge the predicted > genes into one? > > > > Thanks > > > > Best > > Quanwei > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Fri Mar 17 16:37:16 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 17 Mar 2017 18:37:16 -0400 Subject: [maker-devel] putative gene function by mapping to UniProt/Swiss-prot set Message-ID: Hello: I have a questions about the assigning putative gene function by mapping to UniProt/Swiss-prot gene set (described in the protocol published in 2014). Here, for each of the gene model from Maker, the pipeline will find the most similar protein in UniProt/Swiss-prot and assign the function of the matched protein, right? It does not require best-reciprocal hit, right? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Mar 20 07:03:10 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 20 Mar 2017 09:03:10 -0400 Subject: [maker-devel] putative gene function by mapping to UniProt/Swiss-prot set In-Reply-To: References: Message-ID: Hi Quanwei, Correct. Just the best hit when blasting the MAKER generated fasta sequences to Swiss-prot. Thanks, Mike > On Mar 17, 2017, at 6:37 PM, Quanwei Zhang wrote: > > Hello: > > I have a questions about the assigning putative gene function by mapping to UniProt/Swiss-prot gene set (described in the protocol published in 2014). > Here, for each of the gene model from Maker, the pipeline will find the most similar protein in UniProt/Swiss-prot and assign the function of the matched protein, right? > It does not require best-reciprocal hit, right? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Mon Mar 20 11:09:28 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Mon, 20 Mar 2017 13:09:28 -0400 Subject: [maker-devel] evidence of transcript assembly Message-ID: Hello: I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? (1) All the Trinity transcript assemblies? (2) Trinity transcript assemblies that fully cover the mouse genes? (3) Trinity transcript assemblies either fully or partly cover the mouse genes? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Mon Mar 20 11:09:28 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Mon, 20 Mar 2017 13:09:28 -0400 Subject: [maker-devel] evidence of transcript assembly Message-ID: Hello: I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? (1) All the Trinity transcript assemblies? (2) Trinity transcript assemblies that fully cover the mouse genes? (3) Trinity transcript assemblies either fully or partly cover the mouse genes? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenna.kramer at utoronto.ca Mon Mar 20 19:37:45 2017 From: glenna.kramer at utoronto.ca (Glenna Kramer) Date: Tue, 21 Mar 2017 01:37:45 +0000 Subject: [maker-devel] GFF no longer valid after renaming genes Message-ID: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Hi there, I am hoping that you can give me some assistance with finishing up my maker annotated genome for submission. I have been able to rename the genes for GenBank submission - using Support Protocol 2 in the paper by Campbell et. al "Genome Annotation and Curation Using MAKER and MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39. (PMC4286374). I have also been able to use the Support Protocol 3 from that same paper to assign a putative gene function. However, I am running into problems when I am trying to convert the GFF file to the tbl format for submission. I have tried to use scripts from GAG (Genome Annotation Generator) and maker (gff32table). Both of these scripts work wonderfully on the gff originally output from maker, but do not work once I rename the genes for GenBank submission. When I feed my file into a gff validator it turns out that my gff is valid prior to renaming, but after I rename the gff is no longer valid. I have been trying to troubleshoot what is happening to my gff when I rename as in Support Protocol 2, but am stumped. Has anyone else out there had a similar issue? I would be very thankful for any insight that you can provide! Best, Glenna Not sure if this will be helpful, but here is an example gene from prior to renaming: ##gff-version 3 ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 And after renaming: ##gff-version 3 ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; The commands I used were: % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map %map_gff_ids myfilename.map myfilename.gff -------------- next part -------------- An HTML attachment was scrubbed... URL: From adf at ncgr.org Mon Mar 20 19:49:22 2017 From: adf at ncgr.org (Andrew Farmer) Date: Mon, 20 Mar 2017 19:49:22 -0600 Subject: [maker-devel] GFF no longer valid after renaming genes In-Reply-To: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> References: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Message-ID: <127be156-b2bd-574f-5187-9942f05220e2@ncgr.org> Hi Glenna- this may be totally off-base but I have a vague memory that some validators will complain about the semicolon after the last attribute in the column nine attribute list; it's not clear to me from the specification that this is truly illegal, but can imagine why a parser might not like to deal with it. In any case, you might try just removing that terminal semicolon character and see if that solves the validation complaint. but apologies in advance if my dim recollection has misled me into wasting your time... Andrew Farmer On 3/20/17 7:37 PM, Glenna Kramer wrote: > Hi there, > > I am hoping that you can give me some assistance with finishing up my > maker annotated genome for submission. I have been able to rename the > genes for GenBank submission - using Support Protocol 2 in the paper > by Campbell et. al "Genome Annotation and Curation Using MAKER and > MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39. > (PMC4286374). > I have also been able to use the Support Protocol 3 from that same > paper to assign a putative gene function. However, I am running into > problems when I am trying to convert the GFF file to the tbl format > for submission. I have tried to use scripts from GAG (Genome > Annotation Generator) and maker (gff32table). Both of these scripts > work wonderfully on the gff originally output from maker, but do not > work once I rename the genes for GenBank submission. When I feed my > file into a gff validator it turns out that my gff is valid prior to > renaming, but after I rename the gff is no longer valid. I have been > trying to troubleshoot what is happening to my gff when I rename as in > Support Protocol 2, but am stumped. Has anyone else out there had a > similar issue? I would be very thankful for any insight that you can > provide! > > Best, > Glenna > > Not sure if this will be helpful, but here is an example gene from > prior to renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 > ChromoV|quiver|quiver maker exon 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > > And after renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . > ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . > ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; > ChromoV|quiver|quiver maker exon 62081 62650 . + . > ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 > ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > > The commands I used were: > > % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map > > %map_gff_ids myfilename.map myfilename.gff > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- ...all concepts in which an entire process is semiotically concentrated elude definition; only that which has no history is definable. Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 10:15:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 10:15:20 -0600 Subject: [maker-devel] GFF no longer valid after renaming genes In-Reply-To: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> References: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Message-ID: <5DFD02E2-2C6F-49DA-90DE-9E17EE0A8CE2@gmail.com> The problem appears to be the multiple ?|? characters in your contig names (ChromoV|quiver|quiver). They end up in the gene ID, and since ?|? has a special meaning in perl, it creates weird replacement behavior. I?ve attached two scripts that will fix that. Use them to replace their counterparts in the ?/maker/bin/ and .../maker/src/bin/ directories, then rerun all renaming steps on a new gff3 (not the one you already tried to rename). Also you may want to consider changing IDs in the assembly itself before you release it or use it for analysis. You would want to remove the '|quiver|quiver? tail on every contig. That tail has the potential to open up hidden downstream analysis errors from other tools for the same reasons outlined above, since ?|? characters have special meaning. Thanks, Carson > On Mar 20, 2017, at 7:37 PM, Glenna Kramer wrote: > > Hi there, > > I am hoping that you can give me some assistance with finishing up my maker annotated genome for submission. I have been able to rename the genes for GenBank submission - using Support Protocol 2 in the paper by Campbell et. al "Genome Annotation and Curation Using MAKER and MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39.? (PMC4286374). I have also been able to use the Support Protocol 3 from that same paper to assign a putative gene function. However, I am running into problems when I am trying to convert the GFF file to the tbl format for submission. I have tried to use scripts from GAG (Genome Annotation Generator) and maker (gff32table). Both of these scripts work wonderfully on the gff originally output from maker, but do not work once I rename the genes for GenBank submission. When I feed my file into a gff validator it turns out that my gff is valid prior to renaming, but after I rename the gff is no longer valid. I have been trying to troubleshoot what is happening to my gff when I rename as in Support Protocol 2, but am stumped. Has anyone else out there had a similar issue? I would be very thankful for any insight that you can provide! > > Best, > Glenna > > Not sure if this will be helpful, but here is an example gene from prior to renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 > ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > > And after renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; > ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > > The commands I used were: > > % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map > > %map_gff_ids myfilename.map myfilename.gff > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map_fasta_ids Type: application/octet-stream Size: 1676 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map_gff_ids Type: application/octet-stream Size: 5048 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 11:00:06 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 11:00:06 -0600 Subject: [maker-devel] split genes In-Reply-To: References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: I have no suggestions, but maybe someone else on the list may have some. ?Carson > On Mar 17, 2017, at 11:49 AM, Quanwei Zhang wrote: > > Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. > > Best > Quanwei > > 2017-03-17 11:21 GMT-04:00 Carson Holt >: > MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. > > ?Carson > > > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > > > Thanks > > > > Best > > Quanwei > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 11:01:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 11:01:30 -0600 Subject: [maker-devel] evidence of transcript assembly In-Reply-To: References: Message-ID: <297B9C95-919E-4D4F-9103-1FED1550B745@gmail.com> Different sources of data will have different levels of quality. You may want to run them all, then look at results in a browser like Apollo. If specific source look like they are more problematic than others, then drop them. ?Carson > On Mar 20, 2017, at 11:09 AM, Quanwei Zhang wrote: > > Hello: > > I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? > (1) All the Trinity transcript assemblies? > (2) Trinity transcript assemblies that fully cover the mouse genes? > (3) Trinity transcript assemblies either fully or partly cover the mouse genes? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Mar 21 11:47:21 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 21 Mar 2017 17:47:21 +0000 Subject: [maker-devel] split genes In-Reply-To: References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: Just curious but have you tried scaffolding your assembly using your RNA-Seq de novo assembly data? We?ve seen some improvement with BUSCO calls and annotation after doing this using L_RNA_Scaffolder (though you do need to be a bit careful and try reducing your trx assembly down to a somewhat non-redundant set). chris From: maker-devel on behalf of Carson Holt Date: Tuesday, March 21, 2017 at 12:00 PM To: Quanwei Zhang Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] split genes I have no suggestions, but maybe someone else on the list may have some. ?Carson On Mar 17, 2017, at 11:49 AM, Quanwei Zhang > wrote: Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. Best Quanwei 2017-03-17 11:21 GMT-04:00 Carson Holt >: MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. ?Carson > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > Hello: > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Fri Mar 24 03:10:45 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Fri, 24 Mar 2017 10:10:45 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: HI! First of all thank your for previous help. Running Maker 2.31.9 with MPI (Intel) is running fine, if we use ONE node only. But, if we try to concatenate more than one node (e.g. 2 node a? 8 cores) we get this error: [...] ### Running Maker example MOAB_PROCCOUNT: 16 slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead STATUS: Parsing control files... Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. [...] /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 was created before and is EXISTING during the period of the job continuance. I attached the complete log to this e-mail. Again: THANK YOU VERY MUCH. All the best. -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- #!/bin/bash #MSUB -N maker-job #MSUB -j oe #MSUB -o $(JOBNAME).$(JOBID) #MSUB -m ae # -M given_name.family_name at your-uni.de #MSUB -l nodes=2:ppn=8 #MSUB -l mem=20gb #MSUB -l walltime=01:00:00 # start=$(date +%s) echo " " echo "### Setting up shell environment ..." echo " " # if test -e "/etc/profile"; then source "/etc/profile"; fi; if test -e "$HOME/.bash_profile"; then source "$HOME/.bash_profile"; fi; unset LANG; export LC_ALL="C"; export MKL_NUM_THREADS=1; export OMP_NUM_THREADS=1 export USER=${USER:=`logname`} export MOAB_JOBID=${MOAB_JOBID:=`date +%s`} export MOAB_SUBMITDIR=${MOAB_SUBMITDIR:=`pwd`} export MOAB_JOBNAME=${MOAB_JOBNAME:=`basename "$0"`} export MOAB_JOBNAME=$(echo "${MOAB_JOBNAME}" | sed 's/[^a-zA-Z0-9._-]/_/g') export MOAB_NODECOUNT=${MOAB_NODECOUNT:=1} export MOAB_PROCCOUNT=${MOAB_PROCCOUNT:=1} ulimit -s 200000 echo " " echo "### Printing basic job infos to stdout ..." echo " " echo "START_TIME = `date +'%y-%m-%d %H:%M:%S %s'`" echo "HOSTNAME = ${HOSTNAME}" echo "USER = ${USER}" echo "MOAB_JOBNAME = ${MOAB_JOBNAME}" echo "MOAB_JOBID = ${MOAB_JOBID}" echo "MOAB_SUBMITDIR = ${MOAB_SUBMITDIR}" echo "MOAB_NODECOUNT = ${MOAB_NODECOUNT}" echo "MOAB_PROCCOUNT = ${MOAB_PROCCOUNT}" echo "SLURM_NODELIST = ${SLURM_NODELIST}" echo "PBS_NODEFILE = ${PBS_NODEFILE}" if test -f "${PBS_NODEFILE}"; then echo "PBS_NODEFILE (begin) ---------------------------------" NO_NODES=$(wc -l < ${PBS_NODEFILE}) cat "${PBS_NODEFILE}" echo "PBS_NODEFILE (end) -----------------------------------" else NO_NODES=1 fi # ############################################################################## echo " " echo "### Creating TMP_WORK_DIR directory and changing to it ..." echo " " # Using "/tmp/$USER" should be ok for one node jobs. In case of multi-node jobs # it might be neccessary to modify TMP_BASE_DIR to point to SLURM_SUBMIT_DIR # or to create (and delete) TMP_WORK_DIR on each node (job-type dependent). # NEVER EVER calculate in your home directory. JOB_WORK_DIR="${SLURM_JOB_NAME}.uc1.${SLURM_JOB_ID%%.*}.$(date +%y%m%d_%H%M%S)" if test -z "$SLURM_NNODES" -o "$SLURM_NNODES" = "1" then TMP_BASE_DIR="/tmp/${USER}" else # in case of 2 or more nodes, use a common scratch dir available on all nodes... TMP_BASE_DIR="$SLURM_SUBMIT_DIR" fi TMP_WORK_DIR="${TMP_BASE_DIR}/${JOB_WORK_DIR}" echo "JOB_WORK_DIR = ${JOB_WORK_DIR}" echo "TMP_BASE_DIR = ${TMP_BASE_DIR}" echo "TMP_WORK_DIR cd = ${TMP_WORK_DIR}" mkdir -vp "${TMP_WORK_DIR}" && { cd "${TMP_WORK_DIR}"; pwd; } || { echo "ERROR: cd $TMP_WORK_DIR"; exit 1; } # Remarks: # * The job's temporary subdirectory JOB_WORK_DIR consists of SLURM_JOB_NAME # and SLURM_JOB_ID connected by ".uc1.". This is a little bit of magic since # the output file of your job follows the same rule. Therefore the # sorting of files belonging to one job will work nicely, when you # list the result files later in the submit directory (SLURM_SUBMIT_DIR). # * Using TMP_BASE_DIR="/tmp/$USER" is ok, if the job requires less # than 3.6 TB of node local disk space (for details see "www.bwhpc-c5.de"). # ############################################################################## echo " " echo "### Loading MAKER module:" echo " " module load bio/maker/2.31.9 [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } echo "MAKER_VERSION = $MAKER_VERSION" module list echo " " echo "### Copying input examples files for job:" echo " " cp -v ${MAKER_EXA_DIR}/*.{fasta,ctl} . sleep 2 echo " " echo "### Display internal Maker/bwHPC environments..." echo " " echo "MAKER_BIN_DIR = ${MAKER_BIN_DIR}" echo "MAKER_EXA_DIR = ${MAKER_EXA_DIR}" echo "" echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 # # Do NOT use mpiexec here. Unfortunately this crashes # "STATUS: Processing and indexing input FASTA files..." # exec.hydra -n 2 maker -h echo "MOAB_PROCCOUNT: ${MOAB_PROCCOUNT:=1}" # do NOT use mpiexec. use mpiexec.hydra or mpirun. # mpirun -n ${MOAB_PROCCOUNT} maker -h # mpirun -n ${MOAB_PROCCOUNT} maker 2>&1 >maker_$(date +%Y-%m-%d_%H:%M:%S).out mpirun -n ${MOAB_PROCCOUNT} maker echo "### Cleaning up files ... removing unnecessary scratch files ..." echo " " # rm -fv sleep 3 # Sleep some time so potential stale nfs handles can disappear. echo " " echo "### Compressing results and copying back result archive ..." echo " " cd "${TMP_BASE_DIR}" mkdir -vp "${MOAB_SUBMITDIR}" # if user has deleted or moved the submit dir echo "Creating result tgz-file '${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz' ..." tar -zcvf "${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz" "${JOB_WORK_DIR}" \ || { echo "ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR '$TMP_WORK_DIR' on host '$HOSTNAME' manually (if not done automatically by queueing system)."; exit 102; } # Remarks: # * The resulting tgz file is copied back to the submit directory. # The name of the tgz file looks similar too # "bwunicluster-maker-example.moab.275.110528_101755.tgz" echo " " echo "### Final cleanup: Remove TMP_WORK_DIR ..." echo " " rm -rvf "${TMP_WORK_DIR}" echo "END_TIME = `date +'%y-%m-%d %H:%M:%S %s'`" end=$(date +%s) echo " " echo "### Calculate duration ..." echo " " diff=$[end-start] if [ $diff -lt 60 ]; then echo "Runtime (approx.): '$diff' secs" elif [ $diff -ge 60 ]; then echo 'Runtime (approx.): '$[$diff / 60] 'min(s) '$[$diff % 60] 'secs' fi -------------- next part -------------- ### Setting up shell environment ... ### Printing basic job infos to stdout ... START_TIME = 17-03-24 04:35:21 1490326521 HOSTNAME = uc1n385 USER = kn_pop235844 MOAB_JOBNAME = maker-job MOAB_JOBID = 11658541 MOAB_SUBMITDIR = /pfs/work2/workspace/scratch/kn_pop235844-wstest-0 MOAB_NODECOUNT = 2 MOAB_PROCCOUNT = 16 SLURM_NODELIST = uc1n[385,397] PBS_NODEFILE = ### Creating TMP_WORK_DIR directory and changing to it ... JOB_WORK_DIR = maker-job.uc1.11658541.170324_043521 TMP_BASE_DIR = /tmp/kn_pop235844 TMP_WORK_DIR cd = /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521 mkdir: created directory '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521' /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521 ### Loading MAKER module: MAKER_VERSION = 2.31.9 Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) 3) bio/maker/2.31.9 ### Copying input examples files for job: '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_contig.fasta' -> './dpp_contig.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_est.fasta' -> './dpp_est.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_protein.fasta' -> './dpp_protein.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_bopts.ctl' -> './maker_bopts.ctl' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_exe.ctl' -> './maker_exe.ctl' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_opts.ctl' -> './maker_opts.ctl' ### Display internal Maker/bwHPC environments... MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.9/bin MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples ### Runing Maker example MOAB_PROCCOUNT: 16 slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521': No such file or directory: going to /tmp instead STATUS: Parsing control files... Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. ### Cleaning up files ... removing unnecessary scratch files ... ### Compressing results and copying back result archive ... Creating result tgz-file '/pfs/work2/workspace/scratch/kn_pop235844-wstest-0/maker-job.uc1.11658541.170324_043521.tgz' ... maker-job.uc1.11658541.170324_043521/ maker-job.uc1.11658541.170324_043521/dpp_contig.fasta maker-job.uc1.11658541.170324_043521/dpp_est.fasta maker-job.uc1.11658541.170324_043521/dpp_protein.fasta maker-job.uc1.11658541.170324_043521/maker_bopts.ctl maker-job.uc1.11658541.170324_043521/maker_exe.ctl maker-job.uc1.11658541.170324_043521/maker_opts.ctl maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/ maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log ### Final cleanup: Remove TMP_WORK_DIR ... removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_est.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_protein.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_bopts.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_exe.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_opts.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log' removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output' removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521' END_TIME = 17-03-24 04:36:08 1490326568 ### Calculate duration ... Runtime (approx.): '47' secs -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Fri Mar 24 09:00:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 Mar 2017 09:00:58 -0600 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: <2D6022EE-3AFC-4B87-99A3-2D310995A844@gmail.com> This error ?> slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead It is from SLURM and not from MAKER. It occurs before your job even started. It?s from the SLURM initialization of one of the nodes you are using. Note /tmp is not shared. It is independent on each node. So /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 may exist on one node, but not on the others. Since you are somehow setting this before you launch the job, SLURM is complaining because it doesn?t exist on one of the other nodes during initialization. So you need to review how you are launching things. ?Carson > On Mar 24, 2017, at 3:10 AM, Rainer Rutka wrote: > > HI! > First of all thank your for previous help. > Running Maker 2.31.9 with MPI (Intel) is running fine, if we > use ONE node only. > > But, if we try to concatenate more than one node (e.g. 2 node a? 8 > cores) we get this error: > > [...] > ### Running Maker example > > MOAB_PROCCOUNT: 16 > slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead > STATUS: Parsing control files... > Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. > [...] > > /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 > was created before and is EXISTING during the period of the > job continuance. > > I attached the complete log to this e-mail. > > Again: THANK YOU VERY MUCH. > > All the best. > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From carson.holt at genetics.utah.edu Wed Mar 29 12:12:35 2017 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 29 Mar 2017 18:12:35 +0000 Subject: [maker-devel] non-M gene models In-Reply-To: <59ca4391-d32e-bfa8-4118-8c9586f3dfe4@email.arizona.edu> References: <717138b6-fc7f-8f23-e550-c3019c4f96ec@email.arizona.edu> <59ca4391-d32e-bfa8-4118-8c9586f3dfe4@email.arizona.edu> Message-ID: <0AD41A2D-9CFE-48DE-B338-F15D3A590B30@genetics.utah.edu> Maybe. Those two options can result in a lot of partial models. Also setting always_complete=1 will help some. Models without M at the start are generally partial models. There is often something about the contig that keeps it from being a whole model (single basepair error breaks ORF or splice site, or a string of NNN?s overlap part of an exon). You can also try identifying InterPro domain and dropping any model without a defined domain (i.e. if it?s going to be partial, at least make sure it?s useful in its partial form). ?Carson On Mar 29, 2017, at 4:23 AM, Dario Copetti > wrote: Looking at the config file again I notice this: est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no I usually turn them on only to get models from ESTs to train Augustus and SNAP: do you think that having these parameters on during the final annotation will produce the non-M models? If so, do you think that re-running MAKER again with them turned off and using the MAKER-derived gff3 will clean out these models? Can you elaborate a bit more on the usage of these two parameters? Thanks, Dario On 3/29/2017 12:07 PM, Dario Copetti wrote: Hi Carson, We are ready to submit several different sets of annotations but we are now stuck with the issue of having models which protein sequence does not start with Met, and NCBI is picky about that. Below I paste an example of a genome we are working on: as you see, most (95%) of the models start with M, but a significant fraction (almost 1500 models!) does not. We used MAKER 2.31.8, specifying the option of having models that only start with M. We realize that this issue may not be easy to fix - and also that there are indeed isoforms that do not start with M - but how would you fix this? Within or outside MAKER I mean, any help will be appreciated. Some time ago, Josh and Sharon (cc'd) fixed the models by having the CDS start at the first M that was in frame with the exon, and wrote a script for that. Is this issue maybe fixed in a newer version of MAKER? How else would you fix it or deal with NCBI genomes people? Thanks, Dario grep -A1 ">" maker_proteins_161026.fasta | grep -v ">" | grep -v "\-\-" | cut -c1 | sort | uniq -c 106 A 33 C 69 D 88 E 53 F 94 G 34 H 86 I 77 K 144 L 28245 M 58 N 72 P 44 Q 95 R 142 S 80 T 114 V 29 W 6 X 53 Y -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From annabel.beichman at gmail.com Thu Mar 30 11:51:36 2017 From: annabel.beichman at gmail.com (Annabel Beichman) Date: Thu, 30 Mar 2017 10:51:36 -0700 Subject: [maker-devel] RepeatMasker masking olfactory receptors Message-ID: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> Hi Carson, I have a question about RepeatMasker within Maker ? I am finding that all class II olfactory receptors (families like OR2, OR5) are being masked by RepeatMasker as ?RTE-BovB? repeats. This leads to them not being annotated by Maker. I don?t expect my species (a mustelid) to have a large number of Bov-B repeats, and when I put the sequences annotated in my genome as RTE-BovB into repbase?s CENSOR only 13 out of 960 sequences have a hit to anything in repbase. If I put those same sequences into NCBI blast, however, they all blast to olfactory receptors. I am finding the same pattern with another related mustelid de novo genome, and took the Ensembl ferret genome and ran it through the same pipeline and am finding a large number of Bov-B repeats there as well, despite there being none in the official annotation of that genome. I used RepeatMasker with all species libraries, plus a custom library from RepeatModeler. Any idea what might be going on? Thanks so much! ~ Annabel From 4urelie.K at gmail.com Thu Mar 30 12:54:07 2017 From: 4urelie.K at gmail.com (Aurelie K) Date: Thu, 30 Mar 2017 12:54:07 -0600 Subject: [maker-devel] RepeatMasker masking olfactory receptors In-Reply-To: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> References: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> Message-ID: Hi Annabel, I would run RM by specifying your (group of) species, using the -s option of Repeat Masker, mostly if you have a custom de novo library. This will limit the cross masking of repeats that have been identified in other species. Cheers, Aurelie On 30 March 2017 at 11:51, Annabel Beichman wrote: > Hi Carson, > I have a question about RepeatMasker within Maker ? > I am finding that all class II olfactory receptors (families like OR2, > OR5) are being masked by RepeatMasker as ?RTE-BovB? repeats. This leads to > them not being annotated by Maker. I don?t expect my species (a mustelid) > to have a large number of Bov-B repeats, and when I put the sequences > annotated in my genome as RTE-BovB into repbase?s CENSOR only 13 out of 960 > sequences have a hit to anything in repbase. If I put those same sequences > into NCBI blast, however, they all blast to olfactory receptors. I am > finding the same pattern with another related mustelid de novo genome, and > took the Ensembl ferret genome and ran it through the same pipeline and am > finding a large number of Bov-B repeats there as well, despite there being > none in the official annotation of that genome. > > I used RepeatMasker with all species libraries, plus a custom library from > RepeatModeler. > > Any idea what might be going on? > > Thanks so much! > > ~ Annabel > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Wed Mar 1 05:30:39 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Wed, 1 Mar 2017 13:30:39 +0100 Subject: [maker-devel] Maker-Error when started with IMPI In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> Message-ID: <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> Hi Carson. Again THANK YOU for your efforts :-) Am 24.02.2017 um 18:30 schrieb Carson Holt: > Specific things. > > 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause problems with other MPI's. OK, I deleted this envirnoment. Not set any more. > 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has to be compiled for the flavor you are using, so make sure you have a separate installation of MAKER for Intel MPI). Also validate that the mpicc and libmpi.h listed during the MAKER install belong to Intel MPI. Don?t just assume they do because you loaded the module. Manually verify the paths during MAKER?s setup. I validated: UC:[kn at uc1n996 bwhpc-examples]$ module list Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) FOR MPICC: UC:[kn at uc1n996 bwhpc-examples]$ type mpicc mpicc is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc FOR LIBMPI: UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., because the SW was compiled and linkes without any errors or missing libs. > 3. The error you got previously should not even be possible with the current version of Intel MPI, > which is why I say that when you called mpiexec, something else (that was not Intel MPI) was launched. > Easy solution is to give the full path of mpiexec in your job, so are not relying on PATH to be unaltered in your job. mpiexec is in the PATH and the right one is/was used, too. MPIXEC: UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec mpiexec is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec UC:[kn at bwhpc-examples]$ > Do not do ?> mpiexec -nc 1 maker > Do this for example ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker OK, so i did: [...] #MSUB -l nodes=1:ppn=1 #MSUB -l mem=20gb [...] echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker [...] > 4. Build and run on the same node for your test. If you build on one node and run on another, you may > be changing your environment in ways you don?t realize that break things. So if you can build and test on > the same node and it works, then it fails when you test it elsewhere, then you have to track down how your > environment is changing. OK I did. Same node: uc1n996 UNFORTUNATELY I GOT THE SAME ERROR: [...] ### Runing Maker example LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libmpi.so OMPI_MCA_mpi_warn_on_fork=0 I_MPI_CPUINFO=proc I_MPI_PMI_LIBRARY=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libpmi.so I_MPI_PIN_DOMAIN=node I_MPI_FABRICS=shm:tcp I_MPI_HYDRA_IFACE=ib0 mpiexec_uc1n342.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n342.localdomain_kn_pop235844); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) [...] > ?Carson tbc. ? :-) THANX -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From rainer.rutka at uni-konstanz.de Wed Mar 1 05:51:05 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Wed, 1 Mar 2017 13:51:05 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> Message-ID: <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> Sorry, sent wrong e-mail :-( IGNORE THE FIRST MAIL I SENT! Am 01.03.2017 um 13:30 schrieb Rainer Rutka: Hi Carson. Again THANK YOU for your efforts :-) Am 24.02.2017 um 18:30 schrieb Carson Holt: > Specific things. > > 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause > problems with other MPI's. OK, I deleted this envirnoment. Not set any more. > 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has > to be compiled for the flavor you are using, so make sure you have a > separate installation of MAKER for Intel MPI). Also validate that the > mpicc and libmpi.h listed during the MAKER install belong to Intel > MPI. Don?t just assume they do because you loaded the module. Manually > verify the paths during MAKER?s setup. I validated: UC:[kn at uc1n996 bwhpc-examples]$ module list Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) FOR MPICC: UC:[kn at uc1n996 bwhpc-examples]$ type mpicc mpicc is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc FOR LIBMPI: UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., because the SW was compiled and linkes without any errors or missing libs. > 3. The error you got previously should not even be possible with the > current version of Intel MPI, > which is why I say that when you called mpiexec, something else (that > was not Intel MPI) was launched. > Easy solution is to give the full path of mpiexec in your job, so are > not relying on PATH to be unaltered in your job. mpiexec is in the PATH and the right one is/was used, too: MPIXEC: UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec mpiexec is /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > Do not do ?> mpiexec -nc 1 maker > Do this for example ?> > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > -nc maker OK, so i did: [...] #MSUB -l nodes=1:ppn=1 #MSUB -l mem=20gb [...] echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker [...] > 4. Build and run on the same node for your test. If you build on one > node and run on another, you may > be changing your environment in ways you don?t realize that break > things. So if you can build and test on > the same node and it works, then it fails when you test it elsewhere, > then you have to track down how your > environment is changing. OK I did. Same node: uc1n996 UNFORTUNATELY I GOT THE SAME ERROR: [...] Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) 3) bio/maker/2.31.8_impi ### Display internal Maker/bwHPC environments... MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bin MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bwhpc-examples ### Runing Maker example OMPI_MCA_mpi_warn_on_fork=0 I_MPI_CPUINFO=proc I_MPI_PMI_LIBRARY=/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/lib/libmpi.so I_MPI_PIN_DOMAIN=node I_MPI_FABRICS=shm:tcp I_MPI_HYDRA_IFACE=ib0 mpiexec_uc1n326.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n326.localdomain_kn_pop235844); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) ### Cleaning up files ... removing unnecessary scratch files ... [...] > ?Carson tbc. ? :-) THANX -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Wed Mar 1 13:32:54 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 13:32:54 -0700 Subject: [maker-devel] SOBA statistics of Maker annotation In-Reply-To: <2377C5DD-569C-4248-B458-349D7AEA32F5@ucr.edu> References: <688EB172-FEC8-4995-8AA2-0925AF62201A@ucr.edu> <6551374B-54FF-4047-B7A8-A49327FC0036@gmail.com> <73526BAB-57F8-4A47-AADD-DB6883573EAB@ucr.edu> <2377C5DD-569C-4248-B458-349D7AEA32F5@ucr.edu> Message-ID: <6E776F59-F71F-49F7-872A-A0E404970C7E@gmail.com> Perhaps with the way you are counting sequence from the RepeatMasker report you are double counting for repeats that overlap? MAKER reports the command line it uses as part of its STDERR, so you can manually run any step you want outside of MAKER to evaluate. ?Carson > On Feb 25, 2017, at 10:14 AM, Qihua Liang wrote: > > Thank you Barry and Carson! > > I compared the SOBA statistics of RepeatMasker footprint and the report generated by running RepeatMasker alone, I got 2 different parentage of repeats masked. Running RepeatMasker with myTrained.lib, the repeats masked are 42%. But within Maker GFF3, the percentage of repeats masker is only ~18%. What may cause such difference here? > > Thanks > Qihua > >> On Feb 21, 2017, at 1:34 PM, Carson Holt wrote: >> >> MAKER merges overlapping RepeatMasker results into a single longer feature. >> >> ?Carson >> >> >>> On Feb 20, 2017, at 1:34 PM, Qihua Liang wrote: >>> >>> Hi Carson, >>> >>> Thanks for your reply! Now I understand the minimal length of SOBA analysis of Maker gene models in GFF3. >>> >>> I am also using SOBA to calculate the statistics of another sources in the GFF3 file, and I have found another strange thing about RepeatMasker annotation and footprint percentage. >>> >>> Previously, I ran RepeatMasker outside of Maker once, with my_trained.lib (same as used in Maker), and I had bases masked of ~42% from the output report. >>> In running Maker, I provided both ?model_org=all? and ?rmlib=my_trained.lib?. Under these setting, RepeatMasker should be run twice and the merged results of the twice running will be the output of RepeatMasker in GFF3. I am expecting the bases masked by RepeatMasker in the GFF3 will be more than 42%. >>> >>> But in SOBA calculation, the footprint percentage is only ~18%. Referring to the SOBA paper, footprint is calculated as "non-redundant nucleotide count of all features of a given type?. I assume that when SOBA calculates footprint of RepeatMasker features in GFF3, it should be counting the same as "masked bps" as RepeatMasker itself. >>> >>> When Maker ?combines? the 2 runs of RepeatMasker, is it a merge or an overlapping of 2 RepeatMasker results? >>> Besides, instead of using SOBA, are there any accessory scripts updated in Maker to calculate the statistics of the annotations? >>> >>> Thanks >>> Qihua >>> >>> >>>> On Feb 19, 2017, at 10:05 PM, Carson Holt wrote: >>>> >>>> IN GFF3 the CDS and UTR lengths are actually the merge of all CDSs or UTR features, but SOBA is reporting each part individually which may be causing your confusion. This is because SOBA reports per feature statistics and not merged feature statistics. >>>> >>>> CDS?s do not have to take up entire exons. For example start/stop codons may cross splice sites and be split across exons (very common). The result is that each part of the split CDS becomes a separate feature. As a result SOBA will treat each one separately. So a single bp CDS here is not abnormal, since the remaining part of the CDS continues on the next exon as a separate line. The exact same is true for UTR. >>>> >>>> If you want the merged length of the UTR and CDS, it is bets to pull that info out of the _QI= part of the GFF3 attributes for each mRNA. >>>> >>>> What about single bp exons? Those cannot occur unless you gave an input GFF3 with predictions that have single bp exons. The predictors like SNAP and Augustus just won?t produce them, with one exception. They can potentially produce them for the first/last exon. This is not because the exon is 1 bp, but rather because the predictor only reports the CDS part of the exon. As a result if the stop/start codon may have only 1 bp overlapping that exon, but one you add UTR the exon will extend from that point and will no longer be 1bp in length. But if the UTR never gets added, then you can be left with a partial initial/terminal exon. >>>> >>>> However more than likely what you are seeing is just related to how SOBA reports individual feature line stats as opposed to merged stats for CDS and UTR. >>>> >>>> Thanks, >>>> Carson >>>> >>>>> On Feb 18, 2017, at 9:43 AM, Qihua Liang wrote: >>>>> >>>>> Dear Maker develop team, >>>>> >>>>> I used SOBA website to calculate the statistics of Maker annotation, and I found out the length of some features of Maker, like CDS, exon, 5? and 3?UTR, the minimal length of such features can be as short as 1bp. These are confusing, with such features length of 1bp. When Maker combines different gene models and makes such predictions, how will it accept such abnormal exon/CDS length? And is there any parameters in the bopt.ctl or evm.ctl to avoid such abnormal gene models? >>>>> >>>>> Thanks >>>>> Qihua >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>> >> > From carsonhh at gmail.com Wed Mar 1 13:36:17 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 13:36:17 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: Message-ID: If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). An example MPI launch command for running on 200 CPUs on a cluster ?> mpiexec -n 200 maker 2> maker_mpi1.error ?Carson > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: > > Hello: > > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? > > The example in the protocol > % maker 2> maker1.error > % maker 2> maker2.error & > % maker 2> maker3.error & > ...... > > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Wed Mar 1 14:09:30 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Wed, 1 Mar 2017 16:09:30 -0500 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> Message-ID: Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? Best Quanwei 2017-03-01 15:54 GMT-05:00 Carson Holt : > If you split into separate files, you can use the -g option to select the > input file together with the -base option so all output goes to the same > directory. Because they technically have different input files, this will > avoid file locking issues. You have to use the -dsindex option at the end > to rebuild the datastore index, so it looks like a single job. But that is > one way to get around the issue. > > ?Carson > > > > On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: > > Thank you. But I met some problems with MPI on our server. So now I split > my contigs into several files and annotate those files separately. After I > finish the annotation on each file, I will merge the results. > > Thank you for your explanation! > > Best > Quanwei > > 2017-03-01 15:36 GMT-05:00 Carson Holt : > >> If you submit too many simultaneous, MAKER run then file locks will start >> to collide and one run will slow down the others. You should submit fewer >> simultaneous jobs and instead use MPI (maker must be configured and >> compiled to use MPI). >> >> An example MPI launch command for running on 200 CPUs on a cluster ?> >> mpiexec -n 200 maker 2> maker_mpi1.error >> >> ?Carson >> >> >> >> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang >> wrote: >> > >> > Hello: >> > >> > I am doing genome annotation using Maker on our high performance >> computational cluster (HPC). Due to some issues of MPI, I submitted the >> Maker jobs several times under the same directory to HPC. Followed by the >> example in the protocol (as shown below), when I submit the jobs I make >> them as background processes by "&" except the first one. Is this necessary >> when I submit a job to a HPC? I found it costed much much longer time than >> I expected (according to a testing on a smaller data set). I am not sure >> whether setting the process as background process lead to this issue? >> > >> > The example in the protocol >> > % maker 2> maker1.error >> > % maker 2> maker2.error & >> > % maker 2> maker3.error & >> > ...... >> > >> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 >> of the time that cost for annotation a 50000bp contig? I am using SNAP for >> an inito and RNA-seq assembly and protein sequences as evidence. I have >> more than half contigs shorter than 300bp (whose total length is only about >> 5% of the total length of all contigs), I want to know whether I can save >> about half (or only about 5%) of the time if I ignore those short contigs. >> > >> > Thanks >> > >> > Best >> > Quanwei >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 1 14:10:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 14:10:20 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> Message-ID: <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> That will work. ?Carson > On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: > > Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? > > Best > Quanwei > > 2017-03-01 15:54 GMT-05:00 Carson Holt >: > If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. > > ?Carson > > > >> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang > wrote: >> >> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >> >> Thank you for your explanation! >> >> Best >> Quanwei >> >> 2017-03-01 15:36 GMT-05:00 Carson Holt >: >> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >> >> An example MPI launch command for running on 200 CPUs on a cluster ?> >> mpiexec -n 200 maker 2> maker_mpi1.error >> >> ?Carson >> >> >> >> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang > wrote: >> > >> > Hello: >> > >> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >> > >> > The example in the protocol >> > % maker 2> maker1.error >> > % maker 2> maker2.error & >> > % maker 2> maker3.error & >> > ...... >> > >> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >> > >> > Thanks >> > >> > Best >> > Quanwei >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 1 17:43:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Mar 2017 17:43:30 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> Message-ID: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. The following may or may not work if the first two fail: Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h Also send me this file ?> perl/lib/MAKER/ConfigData.pm Thanks, Carson > On Mar 1, 2017, at 5:51 AM, Rainer Rutka wrote: > > > Sorry, sent wrong e-mail :-( > > IGNORE THE FIRST MAIL I SENT! > > Am 01.03.2017 um 13:30 schrieb Rainer Rutka: > Hi Carson. > Again THANK YOU for your efforts :-) > Am 24.02.2017 um 18:30 schrieb Carson Holt: >> Specific things. >> >> 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause >> problems with other MPI's. > > OK, I deleted this envirnoment. Not set any more. > >> 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has >> to be compiled for the flavor you are using, so make sure you have a >> separate installation of MAKER for Intel MPI). Also validate that the >> mpicc and libmpi.h listed during the MAKER install belong to Intel >> MPI. Don?t just assume they do because you loaded the module. Manually >> verify the paths during MAKER?s setup. > > I validated: > UC:[kn at uc1n996 bwhpc-examples]$ module list > Currently Loaded Modulefiles: > 1) compiler/intel/16.0(default) > 2) mpi/impi/5.1.3-intel-16.0(default) > FOR MPICC: > UC:[kn at uc1n996 bwhpc-examples]$ type mpicc > mpicc is > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc > FOR LIBMPI: > UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64 > UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h > Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., > because the SW was compiled and linkes without any errors or missing libs. > >> 3. The error you got previously should not even be possible with the >> current version of Intel MPI, >> which is why I say that when you called mpiexec, something else (that >> was not Intel MPI) was launched. >> Easy solution is to give the full path of mpiexec in your job, so are >> not relying on PATH to be unaltered in your job. > > mpiexec is in the PATH and the right one is/was used, too: > MPIXEC: > UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec > mpiexec is > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > >> Do not do ?> mpiexec -nc 1 maker >> Do this for example ?> >> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec >> -nc maker > OK, so i did: > [...] > #MSUB -l nodes=1:ppn=1 > #MSUB -l mem=20gb > [...] > echo " " > echo "### Runing Maker example" > echo " " > export OMPI_MCA_mpi_warn_on_fork=0 > /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec > -nc maker > [...] > >> 4. Build and run on the same node for your test. If you build on one >> node and run on another, you may >> be changing your environment in ways you don?t realize that break >> things. So if you can build and test on >> the same node and it works, then it fails when you test it elsewhere, >> then you have to track down how your >> environment is changing. > > OK I did. Same node: uc1n996 > UNFORTUNATELY I GOT THE SAME ERROR: > [...] > Currently Loaded Modulefiles: > 1) compiler/intel/16.0(default) > 2) mpi/impi/5.1.3-intel-16.0(default) > 3) bio/maker/2.31.8_impi > > > ### Display internal Maker/bwHPC environments... > > MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bin > MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.8_impi/bwhpc-examples > > > ### Runing Maker example > OMPI_MCA_mpi_warn_on_fork=0 > I_MPI_CPUINFO=proc > I_MPI_PMI_LIBRARY=/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/lib/libmpi.so > I_MPI_PIN_DOMAIN=node > I_MPI_FABRICS=shm:tcp > I_MPI_HYDRA_IFACE=ib0 > mpiexec_uc1n326.localdomain: cannot connect to local mpd (/scratch/mpd2.console_uc1n326.localdomain_kn_pop235844); possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > ### Cleaning up files ... removing unnecessary scratch files ... > [...] > >> ?Carson > tbc. ? :-) > THANX > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From rainer.rutka at uni-konstanz.de Thu Mar 2 01:41:37 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Thu, 2 Mar 2017 09:41:37 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: Hi Carson! Am 02.03.2017 um 01:43 schrieb Carson Holt: > Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h Same error(s). > If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. Yes, they fail. > The following may or may not work if the first two fail: > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello WORKS FINE! > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h WORKS! > Also send me this file ?> perl/lib/MAKER/ConfigData.pm Attached to this mail. > Thanks, > Carson -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: ConfigData.pm Type: application/x-perl Size: 5424 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From rainer.rutka at uni-konstanz.de Thu Mar 2 02:07:07 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Thu, 2 Mar 2017 10:07:07 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: <6cd0a8c5-e6a5-a171-5f80-11d193627aeb@uni-konstanz.de> > The following may or may not work if the first two fail: > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello > Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h mpirun, !mpiexec is running, too! -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Thu Mar 2 10:41:35 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Mar 2017 10:41:35 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> Message-ID: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. Thanks, Carson > On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: > > Hi Carson! > > Am 02.03.2017 um 01:43 schrieb Carson Holt: >> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h > Same error(s). > >> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. > Yes, they fail. > >> The following may or may not work if the first two fail: >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello > WORKS FINE! > >> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h > WORKS! > >> Also send me this file ?> perl/lib/MAKER/ConfigData.pm > Attached to this mail. > >> Thanks, >> Carson > > -- > Rainer Rutka > University of Konstanz > Communication, Information, Media Centre (KIM) > * High-Performance-Computing (HPC) > * KIM-Support and -Base-Services > Room: V511 > 78457 Konstanz, Germany > +49 7531 88-5413 > From mnaymik at tgen.org Thu Mar 2 13:05:22 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Thu, 2 Mar 2017 13:05:22 -0700 Subject: [maker-devel] ThrowNullPointerException() Message-ID: I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Now starting the contig!! SeqID: 5239 Length: 1395 #--------------------------------------------------------------------- Error: NCBI C++ Exception: "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 2 13:25:59 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Mar 2017 13:25:59 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: Message-ID: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Try reinstalling blast, or upgrade to a newer version of blast. ?Carson > On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: > > > I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? > > > > MAKER WARNING: All old files will be erased before continuing > > #--------------------------------------------------------------------- > > Now starting the contig!! > > SeqID: 5239 > > Length: 1395 > > #--------------------------------------------------------------------- > > > > > > Error: NCBI C++ Exception: > > "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti > > > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Fri Mar 3 09:48:34 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Fri, 3 Mar 2017 16:48:34 +0000 Subject: [maker-devel] how to deal with Contigs to run maker? In-Reply-To: <2017022815435664227911@cau.edu.cn> References: <2017022815435664227911@cau.edu.cn> Message-ID: <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> Hi Chao, I don?t think merging the contigs is a good idea. Unless you actually know the distances (in basepairs) between the contigs, this could lead to many spurious alignments. I think you should leave them separate in your fasta file for both repeatmodeler, ab-initio training and running maker. If you?re worried about short contigs in your assembly, you can exclude shorter contigs with the min_contig option in the maker_opts control file. ~Daniel On Feb 28, 2017, at 2:43 AM, dcg at cau.edu.cn wrote: Dear sir: After assemblying, I got many contigs and their order in each chromosome. What I have done is merging these contigs into each chromosomes followed by the order, with 100 Ns inserted betwwen each contigs. So that I got chr1 chr2......Then I ran the repeatmodeler, predictor to annotate it. Could my way reach a high-quality result? Should I use all the contigs to mask repeats and practice predictor? Is there any better way to do genome-wide annotation? I'm looking forward to your reply! Best wishes! Chao Chao ________________________________ 2017.02.28 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 3 10:32:15 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 3 Mar 2017 10:32:15 -0700 Subject: [maker-devel] how to deal with Contigs to run maker? In-Reply-To: <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> References: <2017022815435664227911@cau.edu.cn> <186210C2-8F02-4ED3-8820-7567648207F1@mail.ufl.edu> Message-ID: <7CF3A765-5A93-42B2-AA28-4596CD25A459@gmail.com> I agree. Also a 100bp insert of N?s will essentially be ignored by aligners and predictors. They?ll jump across it as if it was just an intron, resulting in false merges and bad predictions. ?Carson > On Mar 3, 2017, at 9:48 AM, Ence,daniel wrote: > > Hi Chao, I don?t think merging the contigs is a good idea. Unless you actually know the distances (in basepairs) between the contigs, this could lead to many spurious alignments. I think you should leave them separate in your fasta file for both repeatmodeler, ab-initio training and running maker. If you?re worried about short contigs in your assembly, you can exclude shorter contigs with the min_contig option in the maker_opts control file. > > ~Daniel > > >> On Feb 28, 2017, at 2:43 AM, dcg at cau.edu.cn wrote: >> >> Dear sir: >> After assemblying, I got many contigs and their order in each chromosome. >> What I have done is merging these contigs into each chromosomes followed by the order, with 100 Ns inserted betwwen each contigs. So that I got chr1 chr2......Then I ran the repeatmodeler, predictor to annotate it. >> >> Could my way reach a high-quality result? Should I use all the contigs to mask repeats and practice predictor? >> Is there any better way to do genome-wide annotation? >> >> I'm looking forward to your reply! >> Best wishes! >> >> Chao Chao >> 2017.02.28 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Mon Mar 6 01:21:20 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Mon, 6 Mar 2017 09:21:20 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: Hi Carson. Again thank you for your response. But - sorry to say - it's not possible our MPI is corrupt. We have approx. 1.500 users working on our bwUniCluster so far. 95 % of these users use MPI. And: All our other software (see: cis-hpc.uni-konstanz.de ) is running with our implementations of IMPI/OMPI without any issues. :-() Am 02.03.2017 um 18:41 schrieb Carson Holt: > This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello > > All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. > > The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). > > > So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. > > Thanks, > Carson > > > >> On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: >> >> Hi Carson! >> >> Am 02.03.2017 um 01:43 schrieb Carson Holt: >>> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >> Same error(s). >> >>> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. >> Yes, they fail. >> >>> The following may or may not work if the first two fail: >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello >> WORKS FINE! >> >>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >> WORKS! >> >>> Also send me this file ?> perl/lib/MAKER/ConfigData.pm >> Attached to this mail. >> >>> Thanks, >>> Carson >> >> -- >> Rainer Rutka >> University of Konstanz >> Communication, Information, Media Centre (KIM) >> * High-Performance-Computing (HPC) >> * KIM-Support and -Base-Services >> Room: V511 >> 78457 Konstanz, Germany >> +49 7531 88-5413 >> > -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Mon Mar 6 07:47:51 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 6 Mar 2017 07:47:51 -0700 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: <9B00FB6A-B5F5-4240-AB1E-4CBEEEB63C7F@gmail.com> I was able to replicate the error as so ?> 1. Intel MPI installed on CentOS kernel 6 (MPI works fine) 2. Upgrade to kernel 7 without reinstalling and Intel MPI reports the same error as reported by the user. 3. After recompiling Intel MPI on kernel 7 the error goes away. The proof that there is an issue with your Intel MPI installation is in this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello That command is simply trying to get mpiexec to launch ?echo Hello? internally. And it failed. It?s as simple as that. Thanks, Carson > On Mar 6, 2017, at 1:21 AM, Rainer Rutka wrote: > > > Hi Carson. > > Again thank you for your response. > > But - sorry to say - it's not possible our MPI is corrupt. > We have approx. 1.500 users working on our bwUniCluster so far. 95 % > of these users use MPI. And: All our other software (see: > > cis-hpc.uni-konstanz.de ) > > is running with our implementations of IMPI/OMPI without any > issues. > > :-() > > > Am 02.03.2017 um 18:41 schrieb Carson Holt: >> This command -> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >> >> All that command does is start the launcher and print ?Hello?. So since it failed, it means the issue is with your MPI installation (i.e. Intel MPI itself). It would have to be reinstalled and recompiled. I would not be surprised if the issues with the other MPI flavors you tried were for the same reason. They were installed for one architecture/compiler/library set, but you are running them on another one. So they always fail. >> >> The second command was an alternate launcher, but it relys on the same underlying libraries as the first one. So if the first one failed, the second one may fail (it may just happen later on). >> >> >> So the issue boils down to one thing ?> Your MPI is the issue. You need to reinstall/reconfigure and once you can get your MPI working, you can move onto trying MAKER. >> >> Thanks, >> Carson >> >> >> >>> On Mar 2, 2017, at 1:41 AM, Rainer Rutka wrote: >>> >>> Hi Carson! >>> >>> Am 02.03.2017 um 01:43 schrieb Carson Holt: >>>> Try this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 echo Hello >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >>> Same error(s). >>> >>>> If both of these fail, there is the chance that the Intel MPI you are using was compiled on a different architecture than the one you are launching it on. In that case the failure indicates a need to reinstall Intel MPI for that architecture. >>> Yes, they fail. >>> >>>> The following may or may not work if the first two fail: >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 echo Hello >>> WORKS FINE! >>> >>>> Then this command ?> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec.hydra -n 2 /opt/bwhpc/common/bio/maker/2.31.8_impi/bin/maker -h >>> WORKS! >>> >>>> Also send me this file ?> perl/lib/MAKER/ConfigData.pm >>> Attached to this mail. >>> >>>> Thanks, >>>> Carson >>> >>> -- >>> Rainer Rutka >>> University of Konstanz >>> Communication, Information, Media Centre (KIM) >>> * High-Performance-Computing (HPC) >>> * KIM-Support and -Base-Services >>> Room: V511 >>> 78457 Konstanz, Germany >>> +49 7531 88-5413 >>> >> > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From dussert.yann at gmail.com Mon Mar 6 09:51:59 2017 From: dussert.yann at gmail.com (YannDussert) Date: Mon, 6 Mar 2017 17:51:59 +0100 Subject: [maker-devel] Differences in non_overlapping protein file between runs Message-ID: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Hello, First, thank you for developing MAKER, this is a great annotation tool! I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? Thank you in advance for your answer. Best regards, Yann From dcg at cau.edu.cn Sun Mar 5 04:26:59 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Sun, 5 Mar 2017 19:26:59 +0800 Subject: [maker-devel] For help about masking repeats before annotation Message-ID: <2017030519265949065818@cau.edu.cn> Dear sir: Before the maker opeations, I do repeat masking first on my contigs. However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : NOTE: RepeatScout did not return any models. Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? Hope to hear from you. Best wishes! Chao Chao 2017.03.05 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Mon Mar 6 05:24:17 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Mon, 6 Mar 2017 20:24:17 +0800 Subject: [maker-devel] How to merge the annotation results into chromosomes? Message-ID: <2017030620241723514513@cau.edu.cn> Dear sir: Hello, I am doing my utmost to sdudy on annotation now. However, I have been confused on results handlng recently. After alignment, practice and curation, we can get good gene model and merge them by gff_merge and fasta_merge. But how can I merge them into different chromosomes like Homo_sapiens.GRCh38.87.chromosome.11.gff3.gz? I don't just want results of different contigs. I'm looking forward to your reply. Thanks a lot! Best wishes! Chao Chao 2017.03.06 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucys-world at mailbox.org Mon Mar 6 07:40:33 2017 From: lucys-world at mailbox.org (lucys-world at mailbox.org) Date: Mon, 6 Mar 2017 15:40:33 +0100 (CET) Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP Message-ID: <850873370.6534.1488811234072@office.mailbox.org> Dear maker-devel group, I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. after two maker runs I wanted to create a new HMM for the third maker run, but the command fathom genome.ann genoma.dna -gene-stats resulted in 0 genes. What have I done so far: * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 * I was able to create an HMM based on all merged *.gff But these were not many: o out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: o 280 sequences 0.458676 avg GC fraction (min=0.338014 max=0.708052) 7445 genes (plus=3192 minus=4253) 1621 (0.217730) single-exon 5824 (0.782270) multi-exon 168.412018 mean exon (min=1 max=5224) 1464.349243 mean intron (min=30 max=41197) * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. o the gene-stats for the output of the second maker run are: o 282 sequences 0.473125 avg GC fraction (min=0.338014 max=0.725131) 0 genes (plus=0 minus=0) 0 (-nan) single-exon 0 (-nan) multi-exon -nan mean exon (min=2147483647 max=0) -nan mean intron (min=2147483647 max=0) Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) Thank you for your time and help kind regards Lucy -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Mon Mar 6 10:11:57 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Mon, 6 Mar 2017 17:11:57 +0000 Subject: [maker-devel] How to merge the annotation results into chromosomes? In-Reply-To: <2017030620241723514513@cau.edu.cn> References: <2017030620241723514513@cau.edu.cn> Message-ID: <45D1390D-212D-42A4-9819-C0045601B013@mail.ufl.edu> Hi, Do you have data that can precisely place each of your contigs in their position on the chromosome? Without that, this isn?t even possible, since a gff3 file with the chromosomes instead of the contigs requires each contig?s position in the chromosome. And in any case, I don?t think there is a script in the maker tools that does what you?re asking. Maybe someone else has made a script to do that. ~Daniel On Mar 6, 2017, at 7:24 AM, dcg at cau.edu.cn wrote: Dear sir: Hello, I am doing my utmost to sdudy on annotation now. However, I have been confused on results handlng recently. After alignment, practice and curation, we can get good gene model and merge them by gff_merge and fasta_merge. But how can I merge them into different chromosomes like Homo_sapiens.GRCh38.87.chromosome.11.gff3.gz? I don't just want results of different contigs. I'm looking forward to your reply. Thanks a lot! Best wishes! Chao Chao ________________________________ 2017.03.06 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.ence at ufl.edu Mon Mar 6 10:15:07 2017 From: d.ence at ufl.edu (Ence,daniel) Date: Mon, 6 Mar 2017 17:15:07 +0000 Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <850873370.6534.1488811234072@office.mailbox.org> References: <850873370.6534.1488811234072@office.mailbox.org> Message-ID: <970801D9-536E-494C-B5C7-F5F72125FAFC@mail.ufl.edu> Hi Lucy, What were your settings for the second training run? Did you leave protein2genome=1? ~Daniel On Mar 6, 2017, at 9:40 AM, lucys-world at mailbox.org wrote: Dear maker-devel group, I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. after two maker runs I wanted to create a new HMM for the third maker run, but the command fathom genome.ann genoma.dna -gene-stats resulted in 0 genes. What have I done so far: * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 * I was able to create an HMM based on all merged *.gff But these were not many: * out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: * 280 sequences 0.458676 avg GC fraction (min=0.338014 max=0.708052) 7445 genes (plus=3192 minus=4253) 1621 (0.217730) single-exon 5824 (0.782270) multi-exon 168.412018 mean exon (min=1 max=5224) 1464.349243 mean intron (min=30 max=41197) * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. * the gene-stats for the output of the second maker run are: * 282 sequences 0.473125 avg GC fraction (min=0.338014 max=0.725131) 0 genes (plus=0 minus=0) 0 (-nan) single-exon 0 (-nan) multi-exon -nan mean exon (min=2147483647 max=0) -nan mean intron (min=2147483647 max=0) Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) Thank you for your time and help kind regards Lucy _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 6 12:48:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 6 Mar 2017 12:48:49 -0700 Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <850873370.6534.1488811234072@office.mailbox.org> References: <850873370.6534.1488811234072@office.mailbox.org> Message-ID: <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> It looks like you have no genes to train with. So you did something wrong on your second run. Either no gene predictor was running or you provided no evidence for the predictor, so you produced no models. ?Carson > On Mar 6, 2017, at 7:40 AM, lucys-world at mailbox.org wrote: > > Dear maker-devel group, > > > > I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. > > after two maker runs I wanted to create a new HMM for the third maker run, but the command > > > > fathom genome.ann genoma.dna -gene-stats > > > > resulted in 0 genes. > > > > What have I done so far: > > for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 > > > I was able to create an HMM based on all merged *.gff But these were not many: > out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: > 280 sequences > 0.458676 avg GC fraction (min=0.338014 max=0.708052) > 7445 genes (plus=3192 minus=4253) > 1621 (0.217730) single-exon > 5824 (0.782270) multi-exon > 168.412018 mean exon (min=1 max=5224) > 1464.349243 mean intron (min=30 max=41197) > > > For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. > the gene-stats for the output of the second maker run are: > 282 sequences > 0.473125 avg GC fraction (min=0.338014 max=0.725131) > 0 genes (plus=0 minus=0) > 0 (-nan) single-exon > 0 (-nan) multi-exon > -nan mean exon (min=2147483647 max=0) > -nan mean intron (min=2147483647 max=0) > > > Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) > > > > Thank you for your time and help > > kind regards > > Lucy > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Mar 7 08:14:11 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 7 Mar 2017 10:14:11 -0500 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Hi Carson: I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. Thanks 2017-03-01 16:10 GMT-05:00 Carson Holt : > That will work. > > ?Carson > > On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: > > Thank you. I have submit my jobs to our server. What I plan to do is like > this: (1) split contigs into 50 files; (2) for each contig file, I > collected the annotation into gff and protein sequences into fasta format; > (3) manually merge the 50 gff files and protein sequences files. Is what I > am doing also correct? > > Best > Quanwei > > 2017-03-01 15:54 GMT-05:00 Carson Holt : > >> If you split into separate files, you can use the -g option to select the >> input file together with the -base option so all output goes to the same >> directory. Because they technically have different input files, this will >> avoid file locking issues. You have to use the -dsindex option at the end >> to rebuild the datastore index, so it looks like a single job. But that is >> one way to get around the issue. >> >> ?Carson >> >> >> >> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >> >> Thank you. But I met some problems with MPI on our server. So now I >> split my contigs into several files and annotate those files separately. >> After I finish the annotation on each file, I will merge the results. >> >> Thank you for your explanation! >> >> Best >> Quanwei >> >> 2017-03-01 15:36 GMT-05:00 Carson Holt : >> >>> If you submit too many simultaneous, MAKER run then file locks will >>> start to collide and one run will slow down the others. You should submit >>> fewer simultaneous jobs and instead use MPI (maker must be configured and >>> compiled to use MPI). >>> >>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>> mpiexec -n 200 maker 2> maker_mpi1.error >>> >>> ?Carson >>> >>> >>> >>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang >>> wrote: >>> > >>> > Hello: >>> > >>> > I am doing genome annotation using Maker on our high performance >>> computational cluster (HPC). Due to some issues of MPI, I submitted the >>> Maker jobs several times under the same directory to HPC. Followed by the >>> example in the protocol (as shown below), when I submit the jobs I make >>> them as background processes by "&" except the first one. Is this necessary >>> when I submit a job to a HPC? I found it costed much much longer time than >>> I expected (according to a testing on a smaller data set). I am not sure >>> whether setting the process as background process lead to this issue? >>> > >>> > The example in the protocol >>> > % maker 2> maker1.error >>> > % maker 2> maker2.error & >>> > % maker 2> maker3.error & >>> > ...... >>> > >>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 >>> of the time that cost for annotation a 50000bp contig? I am using SNAP for >>> an inito and RNA-seq assembly and protein sequences as evidence. I have >>> more than half contigs shorter than 300bp (whose total length is only about >>> 5% of the total length of all contigs), I want to know whether I can save >>> about half (or only about 5%) of the time if I ignore those short contigs. >>> > >>> > Thanks >>> > >>> > Best >>> > Quanwei >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>> ell-lab.org >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 7 08:35:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 7 Mar 2017 08:35:42 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Use gff3_merge again without the -d option. Just give it all 50 files. --Carson Sent from my iPhone > On Mar 7, 2017, at 8:14 AM, Quanwei Zhang wrote: > > Hi Carson: > > I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? > (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. > (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. > > Thanks > > > > 2017-03-01 16:10 GMT-05:00 Carson Holt : >> That will work. >> >> ?Carson >> >>> On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: >>> >>> Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? >>> >>> Best >>> Quanwei >>> >>> 2017-03-01 15:54 GMT-05:00 Carson Holt : >>>> If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. >>>> >>>> ?Carson >>>> >>>> >>>> >>>>> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >>>>> >>>>> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >>>>> >>>>> Thank you for your explanation! >>>>> >>>>> Best >>>>> Quanwei >>>>> >>>>> 2017-03-01 15:36 GMT-05:00 Carson Holt : >>>>>> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >>>>>> >>>>>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>>>>> mpiexec -n 200 maker 2> maker_mpi1.error >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: >>>>>> > >>>>>> > Hello: >>>>>> > >>>>>> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >>>>>> > >>>>>> > The example in the protocol >>>>>> > % maker 2> maker1.error >>>>>> > % maker 2> maker2.error & >>>>>> > % maker 2> maker3.error & >>>>>> > ...... >>>>>> > >>>>>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >>>>>> > >>>>>> > Thanks >>>>>> > >>>>>> > Best >>>>>> > Quanwei >>>>>> > _______________________________________________ >>>>>> > maker-devel mailing list >>>>>> > maker-devel at box290.bluehost.com >>>>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 7 08:35:42 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 7 Mar 2017 08:35:42 -0700 Subject: [maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI In-Reply-To: References: <9CD22E61-AC30-4749-AFB1-A450BF30413E@gmail.com> <123F86EE-C576-4126-8D77-1964551B71C1@gmail.com> Message-ID: Use gff3_merge again without the -d option. Just give it all 50 files. --Carson Sent from my iPhone > On Mar 7, 2017, at 8:14 AM, Quanwei Zhang wrote: > > Hi Carson: > > I split my contigs into 50 files and annotated them parallelized. After annotation finish, I used "gff3_merge -d" and "fasta_merge -d" to get the gff and fasta files for each of the 50 files. Now I am trying to merge those gff files into one gff. But I found behind the annotation information, the contig sequences are attached into the gff files. So I think I can not simply merge them using the command "cat file1.gff file2.gff ...file50.gff > merged.gff". So I am considering to merge those files in two ways, would you please give me a suggestion (which works)? > (1) If the contigs sequences will not be useful for downstream functional annotation, then I want to remove all the contig sequences from those gff, and then merge gff file with only annotation information using "cat" command. > (2) Merge the annotation part and the contig sequences part (from those 50 gff files) separately, then merge the two file (i.e., the file including all annotation information, and the file including all the contigs sequences) by adding the contig sequence to the end of annotation information. > > Thanks > > > > 2017-03-01 16:10 GMT-05:00 Carson Holt : >> That will work. >> >> ?Carson >> >>> On Mar 1, 2017, at 2:09 PM, Quanwei Zhang wrote: >>> >>> Thank you. I have submit my jobs to our server. What I plan to do is like this: (1) split contigs into 50 files; (2) for each contig file, I collected the annotation into gff and protein sequences into fasta format; (3) manually merge the 50 gff files and protein sequences files. Is what I am doing also correct? >>> >>> Best >>> Quanwei >>> >>> 2017-03-01 15:54 GMT-05:00 Carson Holt : >>>> If you split into separate files, you can use the -g option to select the input file together with the -base option so all output goes to the same directory. Because they technically have different input files, this will avoid file locking issues. You have to use the -dsindex option at the end to rebuild the datastore index, so it looks like a single job. But that is one way to get around the issue. >>>> >>>> ?Carson >>>> >>>> >>>> >>>>> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang wrote: >>>>> >>>>> Thank you. But I met some problems with MPI on our server. So now I split my contigs into several files and annotate those files separately. After I finish the annotation on each file, I will merge the results. >>>>> >>>>> Thank you for your explanation! >>>>> >>>>> Best >>>>> Quanwei >>>>> >>>>> 2017-03-01 15:36 GMT-05:00 Carson Holt : >>>>>> If you submit too many simultaneous, MAKER run then file locks will start to collide and one run will slow down the others. You should submit fewer simultaneous jobs and instead use MPI (maker must be configured and compiled to use MPI). >>>>>> >>>>>> An example MPI launch command for running on 200 CPUs on a cluster ?> >>>>>> mpiexec -n 200 maker 2> maker_mpi1.error >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang wrote: >>>>>> > >>>>>> > Hello: >>>>>> > >>>>>> > I am doing genome annotation using Maker on our high performance computational cluster (HPC). Due to some issues of MPI, I submitted the Maker jobs several times under the same directory to HPC. Followed by the example in the protocol (as shown below), when I submit the jobs I make them as background processes by "&" except the first one. Is this necessary when I submit a job to a HPC? I found it costed much much longer time than I expected (according to a testing on a smaller data set). I am not sure whether setting the process as background process lead to this issue? >>>>>> > >>>>>> > The example in the protocol >>>>>> > % maker 2> maker1.error >>>>>> > % maker 2> maker2.error & >>>>>> > % maker 2> maker3.error & >>>>>> > ...... >>>>>> > >>>>>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100 of the time that cost for annotation a 50000bp contig? I am using SNAP for an inito and RNA-seq assembly and protein sequences as evidence. I have more than half contigs shorter than 300bp (whose total length is only about 5% of the total length of all contigs), I want to know whether I can save about half (or only about 5%) of the time if I ignore those short contigs. >>>>>> > >>>>>> > Thanks >>>>>> > >>>>>> > Best >>>>>> > Quanwei >>>>>> > _______________________________________________ >>>>>> > maker-devel mailing list >>>>>> > maker-devel at box290.bluehost.com >>>>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrisi.hahni at gmail.com Tue Mar 7 17:51:00 2017 From: chrisi.hahni at gmail.com (Christoph Hahn) Date: Wed, 8 Mar 2017 01:51:00 +0100 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> Message-ID: <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> Hi MAKER community, I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! cheers, Christoph On 10/02/2015 17:56, Carson Holt wrote: > I ran a few est2genome runs with a cufflinks file i just generated and > did not get any issues for EST based gene models. > > I?d like to at least have your test set to see if I can duplicate what > you are seeing. > > Use this to upload the job files then I can just run it from my server > here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > ?Carson > > >> On Feb 3, 2015, at 11:13 AM, Jason Gallant > > wrote: >> >> Hi Folks, >> >> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been >> checking the output files, and have noticed that none of my RNAseq >> data was incorporated on the run. I used Cufflinks to perform >> alignments of libraries from several tissues, ran the accessory >> script cufflinks2gff3 for each tissue, then concatenated the >> resulting gff3 files. I even ran the accessory script gff3merge to >> check that the resulting file was properly formatted. >> >> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only >> get protein2genome and repeatmasker evidence in my resulting maker >> gff3 file, and the genes predicted by these. Is there another option >> that I need to enable in order to use my est_gff file? I?m trying to >> get a set of genes to train the predictors for my next step. >> >> Any help would (as always) be greatly appreciated! >> >> Best, >> Jason Gallant >> >> ? >> Dr. Jason R. Gallant >> Assistant Professor >> Room 38 Natural Sciences >> Department of Zoology >> Michigan State University >> East Lansing, MI 48824 >> jgallant at msu.edu >> office: 517-884-7756 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Thu Mar 9 02:36:27 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 09:36:27 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing Message-ID: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> Hi all, I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. Thank you. Sincerely, Ole From carsonhh at gmail.com Thu Mar 9 10:52:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 10:52:30 -0700 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Message-ID: My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. ?Carson > On Mar 6, 2017, at 9:51 AM, YannDussert wrote: > > Hello, > > First, thank you for developing MAKER, this is a great annotation tool! > > I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. > > After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? > > Thank you in advance for your answer. > > Best regards, > Yann > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 9 11:39:11 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 11:39:11 -0700 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> Message-ID: <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> Jason never responded back to this one or uploaded his file to test. He probably figured it out off list. My guess is that your results are too fragmented to build a model that can pass filtering thresholds with. If you want I can take a look. You can upload all files for a test job here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi ?Carson > On Mar 7, 2017, at 5:51 PM, Christoph Hahn wrote: > > Hi MAKER community, > > I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. > > I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! > > cheers, > Christoph > > On 10/02/2015 17:56, Carson Holt wrote: >> I ran a few est2genome runs with a cufflinks file i just generated and did not get any issues for EST based gene models. >> >> I?d like to at least have your test set to see if I can duplicate what you are seeing. >> >> Use this to upload the job files then I can just run it from my server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> ?Carson >> >> >>> On Feb 3, 2015, at 11:13 AM, Jason Gallant > wrote: >>> >>> Hi Folks, >>> >>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been checking the output files, and have noticed that none of my RNAseq data was incorporated on the run. I used Cufflinks to perform alignments of libraries from several tissues, ran the accessory script cufflinks2gff3 for each tissue, then concatenated the resulting gff3 files. I even ran the accessory script gff3merge to check that the resulting file was properly formatted. >>> >>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only get protein2genome and repeatmasker evidence in my resulting maker gff3 file, and the genes predicted by these. Is there another option that I need to enable in order to use my est_gff file? I?m trying to get a set of genes to train the predictors for my next step. >>> >>> Any help would (as always) be greatly appreciated! >>> >>> Best, >>> Jason Gallant >>> >>> ? >>> Dr. Jason R. Gallant >>> Assistant Professor >>> Room 38 Natural Sciences >>> Department of Zoology >>> Michigan State University >>> East Lansing, MI 48824 >>> jgallant at msu.edu >>> office: 517-884-7756 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 9 11:51:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 11:51:25 -0700 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> Message-ID: <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure ?Carson > On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: > > Hi all, > I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. > > First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? > > I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. > > Thank you. > > Sincerely, > Ole > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucys-world at mailbox.org Tue Mar 7 01:39:40 2017 From: lucys-world at mailbox.org (lucys-world at mailbox.org) Date: Tue, 7 Mar 2017 09:39:40 +0100 (CET) Subject: [maker-devel] Ab initio gene prediction; 0 genes when creating HMM via SNAP In-Reply-To: <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> References: <850873370.6534.1488811234072@office.mailbox.org> <83BC008A-F9CF-4FBA-AB47-BD2125A474BE@gmail.com> Message-ID: <1407048207.7112.1488875981292@office.mailbox.org> Hallo Carson, hello Daniel, thank you for your fast reply and help. To Daniels question: Yes unfortunately I had protein2genome=1 in all runs To Carson: After reading a lot through the forum I figured that I had a mistake in understanding an initio gene prediction. I thought one had to perform 3 maker run in total. One training run and then two maker runs for annotation. But now I think there are only two maker in to perform in total (one training and then one annotation run) is that correct? So after my first run I created an HMM based on the first gene-stats (with 7445 genes) and performed my second run with this HMM. Then I tried to create a new HMM based on my second run output. I think that is not necessary since the output of the second run should be my annotated genome? I think I have to redo my maker runs and for that have to questions regarding the maker_opts.ctl: 1. Training run: For that I have to give maker my genome, my evidence (in my Case Busco and Swissport data sets) and set protein2genome=1 . Since that is my only evidence I don't change anything else? (I don't add anything in the gene prediction paragraph?) 2. Annotation run: With the gff output of the training run I create my own HMM from SNAP. In the maker_opts.ctl I then add for this annotation run my SNAP-HMM and set AugustusSpecies on the closest related species (as recommended in the Augustus manual), is that correct? Do I give also my Protein evidence as I did in the Trainingsrun? Thank you very much for your time and help with that ! - Lucy > Carson Holt hat am 6. M?rz 2017 um 20:48 geschrieben: > > It looks like you have no genes to train with. So you did something wrong on your second run. Either no gene predictor was running or you provided no evidence for the predictor, so you produced no models. > > ?Carson > > > > > > On Mar 6, 2017, at 7:40 AM, lucys-world at mailbox.org mailto:lucys-world at mailbox.org wrote: > > > > > > Dear maker-devel group, > > > > > > I have some issues with my maker ab initio gene prediction (for a new mammal genome) when creating an HMM via SNAP. > > > > after two maker runs I wanted to create a new HMM for the third maker run, but the command > > > > > > fathom genome.ann genoma.dna -gene-stats > > > > > > resulted in 0 genes. > > > > > > What have I done so far: > > > > * for the first training run I only used BUSCO and Swiss-Port data bank as references (Since no EST are available for my species). Additionally I set protein2genome =1 > > > > > > * I was able to create an HMM based on all merged *.gff But these were not many: > > o out of 27.032 Scafolds (Sequences) only 280 were used for the HMM; here the gene-stats: > > o 280 sequences > > 0.458676 avg GC fraction (min=0.338014 max=0.708052) > > 7445 genes (plus=3192 minus=4253) > > 1621 (0.217730) single-exon > > 5824 (0.782270) multi-exon > > 168.412018 mean exon (min=1 max=5224) > > 1464.349243 mean intron (min=30 max=41197) > > > > > > * For the second maker run I then used this HMM and again the BUSCO+SwissPort.fasta reference file. > > o the gene-stats for the output of the second maker run are: > > o 282 sequences > > 0.473125 avg GC fraction (min=0.338014 max=0.725131) > > 0 genes (plus=0 minus=0) > > 0 (-nan) single-exon > > 0 (-nan) multi-exon > > -nan mean exon (min=2147483647 max=0) > > -nan mean intron (min=2147483647 max=0) > > > > > > Would you recommend to rerun everything, e.g. with an additional Augustus gene prediction (species=human), or EST from related species? (If so how close related?) > > > > > > Thank you for your time and help > > > > kind regards > > > > Lucy > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com mailto:maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Thu Mar 9 12:42:31 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:42:31 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> Message-ID: <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> Hi Carson. In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? So I guess this version is not available, or maybe they meant 3.0beta1 or something. ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. Thank you. Ole > On 09 Mar 2017, at 19:51, Carson Holt wrote: > > Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. > > It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). > > I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. > > It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure > > ?Carson > > > >> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >> >> Hi all, >> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >> >> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >> >> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >> >> Thank you. >> >> Sincerely, >> Ole >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Thu Mar 9 12:50:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Mar 2017 12:50:10 -0700 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> Message-ID: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. ?Carson > On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: > > Hi Carson. > > In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? > > So I guess this version is not available, or maybe they meant 3.0beta1 or something. > > ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. > > Thank you. > > Ole > >> On 09 Mar 2017, at 19:51, Carson Holt wrote: >> >> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >> >> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >> >> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >> >> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >> >> ?Carson >> >> >> >>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>> >>> Hi all, >>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>> >>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>> >>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>> >>> Thank you. >>> >>> Sincerely, >>> Ole >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From o.k.torresen at ibv.uio.no Thu Mar 9 12:55:00 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:55:00 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> Message-ID: Ah, thank you. That explains it. Ole > On 09 Mar 2017, at 20:50, Carson Holt wrote: > > My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. > > ?Carson > > >> On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: >> >> Hi Carson. >> >> In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? >> >> So I guess this version is not available, or maybe they meant 3.0beta1 or something. >> >> ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. >> >> Thank you. >> >> Ole >> >>> On 09 Mar 2017, at 19:51, Carson Holt wrote: >>> >>> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >>> >>> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >>> >>> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >>> >>> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >>> >>> ?Carson >>> >>> >>> >>>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>>> >>>> Hi all, >>>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>>> >>>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>>> >>>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> Ole >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From o.k.torresen at ibv.uio.no Thu Mar 9 12:59:35 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Mar 2017 19:59:35 +0000 Subject: [maker-devel] MAKER version 3.1 and integration with resequencing In-Reply-To: <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> References: <5307593A-B6ED-4680-B00C-DC9132CF2D95@ibv.uio.no> <46069559-E05E-43D6-B9DC-DAD987E1D2BA@gmail.com> <319496A6-CB15-4C4F-9070-C2A56C7C6A32@ibv.uio.no> <8FFC703A-9895-4081-81D9-49A2BB494F8A@gmail.com> Message-ID: <0B73432A-E0EE-4983-8314-E8A94AADA74F@ibv.uio.no> Ah, thank you. That explains it. Ole > On 09 Mar 2017, at 20:50, Carson Holt wrote: > > My guess is that Michael may have called it 3.1 because he used the subversion repository which is beyond the 3.0-beta download but has not been packaged for release yet. > > ?Carson > > >> On Mar 9, 2017, at 12:42 PM, Ole Kristian T?rresen wrote: >> >> Hi Carson. >> >> In the article I linked to, The draft genome sequence and annotation of the desert woodrat Neotoma lepida (http://www.sciencedirect.com/science/article/pii/S2213596016300800), this sentence is found: "To annotate the whole genome, MAKER version 3.1 was run on Neotoma lepida using Trinity assembled mRNA-seq reads (described above), and all annotated mouse and rat proteins available from NCBI (ftp://ftp.ncbi.nih.gov/genomes/).? >> >> So I guess this version is not available, or maybe they meant 3.0beta1 or something. >> >> ACE looks like a really cool tool, I?ll pass it on to people that have the correct datasets. >> >> Thank you. >> >> Ole >> >>> On 09 Mar 2017, at 19:51, Carson Holt wrote: >>> >>> Currently only 3.0 beta is available. It integrates EVM, and slightly alters some prediction hints for algorithms like Augustus. >>> >>> It can be used to identify genes on a new reference or update existing gene models (requires that existing models be in GFF3 against the reference genome). >>> >>> I think in the presentation Mark was referring to a separate MAKER fork. The MAKER fork will take a species reference genome, a VCF file derived from resequenced individuals, and it will rebuild gene models around the individual variation. This allows us to identify simple changes like amino acid substitutions between individuals as well as complex changes related to splicing, exon skipping, etc. >>> >>> It uses the prediction tool described in this paper (paper contains several examples of variation we can properly predict against) ?> https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btw799/2736367/High-throughput-interpretation-of-gene-structure >>> >>> ?Carson >>> >>> >>> >>>> On Mar 9, 2017, at 2:36 AM, Ole Kristian T?rresen wrote: >>>> >>>> Hi all, >>>> I was asked to provide some text for a short description of assembly and annotation of a genome, and did some quick googling to see if I was up to date on what has happened with MAKER lately. >>>> >>>> First I found the publication from last year describing sequencing and annotation of the desert woodrat (http://www.sciencedirect.com/science/article/pii/S2213596016300800). When reading that article, I saw references to MAKER 3.1. As far as I can see from http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi, the latest MAKER is 3.00.0-beta. Is 3.1 available somewhere, or is it going to be released soon? >>>> >>>> I also saw that a poster that was presented at PAG last year (https://pag.confex.com/pag/xxiv/webprogram/Paper19035.html) and was intrigued with the last sentence ?...integrating MAKER with resequencing efforts to enable rapid genotype-phenotype association.? Is this part of MAKER 3.1, or a separate effort? I am very interested in the status of this. >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> Ole >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > From chrisi.hahni at gmail.com Fri Mar 10 01:50:52 2017 From: chrisi.hahni at gmail.com (Christoph Hahn) Date: Fri, 10 Mar 2017 09:50:52 +0100 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> Message-ID: <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> Dear Carson, Thanks for getting in touch! I actually managed in the end. I converted the gtf I had from cufflinks to gff3 via the script 'gtf2gff.pl' from augustus and then used the script 'gffGetmRNA.pl' again from augustus to extract the mRNA in fasta. This file I fed to MAKER via the 'est=' route and now I get plenty of est2genome evidence in the maker result. So the problem seems to be limited to the route 'est_gff=', allthough there is no error message whatsoever the est2genome routine seems to never be triggered. I'd still be happy to upload my data (the cufflinks gff, the genome fasta, anything else?) if you want to try to reproduce the problem. Let me know! btw I seem to be unable to create a new topic or respond to topics via google groups. Is the list closed or the access restricted somehow. I only managed by responding to Jason's mail which I still had in my inbox directly via my gmail. Thanks! cheers, Christoph On 09/03/2017 19:39, Carson Holt wrote: > Jason never responded back to this one or uploaded his file to test. > He probably figured it out off list. My guess is that your results are > too fragmented to build a model that can pass filtering thresholds with. > > If you want I can take a look. You can upload all files for a test job > here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > > ?Carson > > > >> On Mar 7, 2017, at 5:51 PM, Christoph Hahn > > wrote: >> >> Hi MAKER community, >> >> I think I am seeing the same issue that Jason has reported. ran >> cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER >> via 'est_gff=' with 'est2genome=1'. In the resulting gff file from >> maker I only get protein2genome and repeatmasker evidence. If I do a >> search in the maker log est2genome never comes up. Tried to extract >> the cufflinks results as fasta and feed to MAKER via 'est='. Still no >> indication that the evidence is used. >> >> I am using MAKER 2.31.8. Any help would be much appreciated! Thanks >> in advance for your time! >> >> cheers, >> Christoph >> >> On 10/02/2015 17:56, Carson Holt wrote: >>> I ran a few est2genome runs with a cufflinks file i just generated >>> and did not get any issues for EST based gene models. >>> >>> I?d like to at least have your test set to see if I can duplicate >>> what you are seeing. >>> >>> Use this to upload the job files then I can just run it from my >>> server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> ?Carson >>> >>> >>>> On Feb 3, 2015, at 11:13 AM, Jason Gallant >>> > wrote: >>>> >>>> Hi Folks, >>>> >>>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been >>>> checking the output files, and have noticed that none of my RNAseq >>>> data was incorporated on the run. I used Cufflinks to perform >>>> alignments of libraries from several tissues, ran the accessory >>>> script cufflinks2gff3 for each tissue, then concatenated the >>>> resulting gff3 files. I even ran the accessory script gff3merge to >>>> check that the resulting file was properly formatted. >>>> >>>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only >>>> get protein2genome and repeatmasker evidence in my resulting maker >>>> gff3 file, and the genes predicted by these. Is there another >>>> option that I need to enable in order to use my est_gff file? I?m >>>> trying to get a set of genes to train the predictors for my next step. >>>> >>>> Any help would (as always) be greatly appreciated! >>>> >>>> Best, >>>> Jason Gallant >>>> >>>> ? >>>> Dr. Jason R. Gallant >>>> Assistant Professor >>>> Room 38 Natural Sciences >>>> Department of Zoology >>>> Michigan State University >>>> East Lansing, MI 48824 >>>> jgallant at msu.edu >>>> office: 517-884-7756 >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dussert.yann at gmail.com Fri Mar 10 03:53:36 2017 From: dussert.yann at gmail.com (YannDussert) Date: Fri, 10 Mar 2017 11:53:36 +0100 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> Message-ID: <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> Hi, Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round. I can't see any problem with it, it looks like this: Plvit001 augustus_masked match 66626 70338 0.85 + . ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 Plvit001 augustus_masked match_part 66626 67586 0.85 + . ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961 Plvit001 augustus match 66626 70338 1 + . ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1 Plvit001 augustus match_part 66626 70096 1 + . ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471 Plvit001 augustus_masked match_part 68166 68486 0.85 + . ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321 Plvit001 augustus_masked match_part 69504 70096 0.85 + . ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593 Plvit001 augustus_masked match_part 70174 70338 0.85 + . ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165 Best regards, Yann On 09/03/2017 18:52, Carson Holt wrote: > My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. > > ?Carson > > >> On Mar 6, 2017, at 9:51 AM, YannDussert wrote: >> >> Hello, >> >> First, thank you for developing MAKER, this is a great annotation tool! >> >> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. >> >> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? >> >> Thank you in advance for your answer. >> >> Best regards, >> Yann >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ereboperezsilva at gmail.com Fri Mar 10 04:05:29 2017 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Fri, 10 Mar 2017 12:05:29 +0100 Subject: [maker-devel] ERROR: Chunk failed Message-ID: Hi! I'm having some trouble understanding the ERROR I'm receiving. Recently I've set up a new machine to work annotate a genome (around 2 Gb big) using Maker. We mounted a new disk of 1Tb and loaded there the files of a uncomplete run of annotation (we started it in a different machine and move it to this one, which had more precessing power). Apparently everything was ok, until somewhen yesterday we received the next ERROR: examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig4633 We are running 16 jobs of maker at the same time, on the unsplitted genome. We checked and "df" command returned that only 7% os the mounted disk was used. So the space does not appear to be the problem... Why that error then? Thanks for the help. Jos? Mar?a Gonz?lez P?rez-Silva. PhD student at Universidad de Oviedo. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ereboperezsilva at gmail.com Fri Mar 10 10:21:38 2017 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Fri, 10 Mar 2017 18:21:38 +0100 Subject: [maker-devel] Maker ERROR Message-ID: Hi, I wrote early this day, in reference to a problem of (apparently) space. After I deleted some unnecesary files (despite having plenty of storage left), I killed all the processes, and set 'clean_try=1' as recomended in this post . Before re-running the processes, we checked that there were no limitation over the size of a directory or something similar. After re-running, at first, all seemed correct, but when I re-checked some time after, I found out a lot of contigs with the status FAILED without folder specification in the '_master_datastore_index.log', looking like: Contig480 FAILED > Contig496 FAILED Contig512 FAILED Contig528 FAILED Contig544 FAILED Contig560 FAILED? But checking the 'nohub.out' of every proccess (16 in total, as the machine has 16 cores), I notice that each run is, from time to time, processing the contig correctly. So, after several (a lot) of FAILED contigs, it process one correctly. As said in the previous email, the ERROR dispolayed in the nohup.out is (including the last part of a processed contig at the beguinning): ? > #--------- command -------------# Widget::blastx: /usr/bin/blastall -p blastx -d > /data/ge/tmp/maker_VfDQQU/hsap_ensembl%2Efa.mpi.10.6 -i > /data/ge/tmp/maker_VfDQQU/0/Contig20.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y > 500000000 -a 4 -U -F T -I T -o > /data/ge/round3/cg.maker.output/cg_datastore/56/AC/Contig20//theVoid.Contig20/0/Contig20.0.hsap_ensembl%2Efa.blastx.temp_dir/hsap_ensembl%2Efa.mpi.10.6.blastx #-------------------------------# deleted:511 hits doing blastx of proteins open3: fork failed: Cannot allocate memory at > /home/jmgps/software/maker/bin/../lib/File/NFSLock.pm line 1037. --> rank=NA, hostname=Planarian2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig20 > ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig20 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig22 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig24 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig26 > examining contents of the fasta file and run log ERROR: could not make datastore directory --> rank=NA, hostname=Planarian2 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:Contig28? I'm totally lost here, I think it is still processing contigs, but the FAILED attemps slow down the whole process, and we are in a hurry due to the maintenance of the machine. And I can't understand the source of the ERROR. I will be more than happy to provide more details about the problem, if requested. Thanks a lot for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 10 10:34:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 Mar 2017 10:34:34 -0700 Subject: [maker-devel] Maker ERROR In-Reply-To: References: Message-ID: Several things. 1. MAKER does a lot of it?s work in a temporary directory (usually /tmp). This directory must be locally mounted and cannot be a network mounted location. If this location is full you can get issues. 2. MAKER needs at least 1GB of RAM per process (2-3GB is safer), so if you don?t have enough RAM you may need to run fewer processes (with MPI multiply whatever you supplied to the mpiexec -n flag by 1GB). 3. If you are launching MAKER multiple times as opposed to launching once via MPI, you will exacerbate the above limitations as well as open up IO limitations. MAKER can and does saturate IO when run multiple times simultaneously (this is especially true for network mounted locations). If you run via MPI you can greatly reduce IO, so make sure you are using MPI and not just launching MAKER multiple times. If you absolutely have to start multiple jobs, you can reduce IO somewhat by splitting the input fasta into pieces (use fasta_tool). Give a separate piece to each job via maker?s -g flag, and set -base so all results from all jobs get written to the same location. Then each job can avoid multiple file locks that would have been encountered by sharing input. Note that you must rebuild the datastore index using 'maker -dsindex? when all jobs complete. ?Carson > On Mar 10, 2017, at 10:21 AM, Jos? M? G. Perez-Silva wrote: > > Hi, > > I wrote early this day, in reference to a problem of (apparently) space. After I deleted some unnecesary files (despite having plenty of storage left), I killed all the processes, and set 'clean_try=1' as recomended in this post . Before re-running the processes, we checked that there were no limitation over the size of a directory or something similar. > > After re-running, at first, all seemed correct, but when I re-checked some time after, I found out a lot of contigs with the status FAILED without folder specification in the '_master_datastore_index.log', looking like: > > Contig480 FAILED > Contig496 FAILED > Contig512 FAILED > Contig528 FAILED > Contig544 FAILED > Contig560 FAILED? > > But checking the 'nohub.out' of every proccess (16 in total, as the machine has 16 cores), I notice that each run is, from time to time, processing the contig correctly. So, after several (a lot) of FAILED contigs, it process one correctly. As said in the previous email, the ERROR dispolayed in the nohup.out is (including the last part of a processed contig at the beguinning): > > ?#--------- command -------------# > Widget::blastx: > /usr/bin/blastall -p blastx -d /data/ge/tmp/maker_VfDQQU/hsap_ensembl%2Efa.mpi.10.6 -i /data/ge/tmp/maker_VfDQQU/0/Contig20.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 4 -U -F T -I T -o /data/ge/round3/cg.maker.output/cg_datastore/56/AC/Contig20//theVoid.Contig20/0/Contig20.0.hsap_ensembl%2Efa.blastx.temp_dir/hsap_ensembl%2Efa.mpi.10.6.blastx > #-------------------------------# > deleted:511 hits > doing blastx of proteins > open3: fork failed: Cannot allocate memory at /home/jmgps/software/maker/bin/../lib/File/NFSLock.pm line 1037. > --> rank=NA, hostname=Planarian2 > ERROR: Failed while doing blastx of proteins > ERROR: Chunk failed at level:8, tier_type:3 > FAILED CONTIG:Contig20 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:Contig20 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig22 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig24 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig26 > > examining contents of the fasta file and run log > ERROR: could not make datastore directory > --> rank=NA, hostname=Planarian2 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:Contig28? > > I'm totally lost here, I think it is still processing contigs, but the FAILED attemps slow down the whole process, and we are in a hurry due to the maintenance of the machine. And I can't understand the source of the ERROR. > > I will be more than happy to provide more details about the problem, if requested. > > Thanks a lot for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 10:16:25 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 10:16:25 -0600 Subject: [maker-devel] Differences in non_overlapping protein file between runs In-Reply-To: <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> References: <2a2006dc-9332-3479-c193-0d90a26d9909@gmail.com> <84509b8b-84f6-b2d8-29ea-d86fc2177def@gmail.com> Message-ID: <9EC90572-7E3F-4B07-9098-6CAFD7B3A4B0@gmail.com> I see you have both masked and unmasked augustus calls, so you may have a lot of non-masked predictions in your second run that are entirely contained in transposons and repeat regions (that is why they do not overlap). Really the easiest thing to do would be to open the results in a browser, find one of the ones listed as non-overlapping, and then look at it to see why it is not overlapping. You can then look at that specific location directly in the file as needed, but it will be much easier to interpret looking at the features drawn in a browser (like Apollo - desktop version). ?Carson > On Mar 10, 2017, at 3:53 AM, YannDussert wrote: > > Hi, > > Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round. > > I can't see any problem with it, it looks like this: > > Plvit001 augustus_masked match 66626 70338 0.85 + . ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 > Plvit001 augustus_masked match_part 66626 67586 0.85 + . ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961 > Plvit001 augustus match 66626 70338 1 + . ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1 > Plvit001 augustus match_part 66626 70096 1 + . ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471 > Plvit001 augustus_masked match_part 68166 68486 0.85 + . ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321 > Plvit001 augustus_masked match_part 69504 70096 0.85 + . ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593 > Plvit001 augustus_masked match_part 70174 70338 0.85 + . ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165 > > > Best regards, > > Yann > > On 09/03/2017 18:52, Carson Holt wrote: >> My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything. >> >> ?Carson >> >> >>> On Mar 6, 2017, at 9:51 AM, YannDussert wrote: >>> >>> Hello, >>> >>> First, thank you for developing MAKER, this is a great annotation tool! >>> >>> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1. >>> >>> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something? >>> >>> Thank you in advance for your answer. >>> >>> Best regards, >>> Yann >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 10:17:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 10:17:58 -0600 Subject: [maker-devel] Est2Genome Problems In-Reply-To: <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> References: <1422987193321.4df3c9d5@Nodemailer> <119684F8-8071-4318-A129-3D90EC54242A@gmail.com> <4e2b870a-601d-6f04-0b37-42e940749dfd@gmail.com> <33720C49-5D1B-46DF-A89C-43A7683D7C02@gmail.com> <27bc6d85-9a64-d30b-bfc9-148c2185a39a@gmail.com> Message-ID: Sure. Send me the file. On a side note, I find cufflinks results to be very noisy (lot?s of false positives). I usually get better results using assembled reads from Trinity (with -jaccard_clip option set), or using Stringtie. Thanks, Carson > On Mar 10, 2017, at 1:50 AM, Christoph Hahn wrote: > > Dear Carson, > > Thanks for getting in touch! I actually managed in the end. I converted the gtf I had from cufflinks to gff3 via the script 'gtf2gff.pl' from augustus and then used the script 'gffGetmRNA.pl' again from augustus to extract the mRNA in fasta. This file I fed to MAKER via the 'est=' route and now I get plenty of est2genome evidence in the maker result. So the problem seems to be limited to the route 'est_gff=', allthough there is no error message whatsoever the est2genome routine seems to never be triggered. > > I'd still be happy to upload my data (the cufflinks gff, the genome fasta, anything else?) if you want to try to reproduce the problem. Let me know! > > btw I seem to be unable to create a new topic or respond to topics via google groups. Is the list closed or the access restricted somehow. I only managed by responding to Jason's mail which I still had in my inbox directly via my gmail. > > Thanks! > > cheers, > Christoph > > On 09/03/2017 19:39, Carson Holt wrote: >> Jason never responded back to this one or uploaded his file to test. He probably figured it out off list. My guess is that your results are too fragmented to build a model that can pass filtering thresholds with. >> >> If you want I can take a look. You can upload all files for a test job here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> >> ?Carson >> >> >> >>> On Mar 7, 2017, at 5:51 PM, Christoph Hahn > wrote: >>> >>> Hi MAKER community, >>> >>> I think I am seeing the same issue that Jason has reported. ran cufflinks, then cufflinks2gff3 and tried to feed the result to MAKER via 'est_gff=' with 'est2genome=1'. In the resulting gff file from maker I only get protein2genome and repeatmasker evidence. If I do a search in the maker log est2genome never comes up. Tried to extract the cufflinks results as fasta and feed to MAKER via 'est='. Still no indication that the evidence is used. >>> >>> I am using MAKER 2.31.8. Any help would be much appreciated! Thanks in advance for your time! >>> >>> cheers, >>> Christoph >>> >>> On 10/02/2015 17:56, Carson Holt wrote: >>>> I ran a few est2genome runs with a cufflinks file i just generated and did not get any issues for EST based gene models. >>>> >>>> I?d like to at least have your test set to see if I can duplicate what you are seeing. >>>> >>>> Use this to upload the job files then I can just run it from my server here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>>> >>>> ?Carson >>>> >>>> >>>>> On Feb 3, 2015, at 11:13 AM, Jason Gallant > wrote: >>>>> >>>>> Hi Folks, >>>>> >>>>> I?ve nearly succeeded at getting MAKER to run on AWS? I?ve been checking the output files, and have noticed that none of my RNAseq data was incorporated on the run. I used Cufflinks to perform alignments of libraries from several tissues, ran the accessory script cufflinks2gff3 for each tissue, then concatenated the resulting gff3 files. I even ran the accessory script gff3merge to check that the resulting file was properly formatted. >>>>> >>>>> For options, I set est2genome=1 and est_gff=cufflinks.gff. I only get protein2genome and repeatmasker evidence in my resulting maker gff3 file, and the genes predicted by these. Is there another option that I need to enable in order to use my est_gff file? I?m trying to get a set of genes to train the predictors for my next step. >>>>> >>>>> Any help would (as always) be greatly appreciated! >>>>> >>>>> Best, >>>>> Jason Gallant >>>>> >>>>> ? >>>>> Dr. Jason R. Gallant >>>>> Assistant Professor >>>>> Room 38 Natural Sciences >>>>> Department of Zoology >>>>> Michigan State University >>>>> East Lansing, MI 48824 >>>>> jgallant at msu.edu >>>>> office: 517-884-7756 >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Tue Mar 14 11:29:49 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Tue, 14 Mar 2017 10:29:49 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: setting up GFF3 output and fasta chunks doing blastn of ESTs running blast search. #--------- command -------------# Widget::blastn: /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah #-------------------------------# Error: NCBI C++ Exception: "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr Error: NCBI C++ Exception: "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr examining contents of the fasta file and run log ERROR: BLASTN failed --> rank=87, hostname=pnap-pe7-s09 ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:6537645 ERROR: BLASTN failed --> rank=88, hostname=pnap-pe7-s09 ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:6537659 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:6537645 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:6537659 On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt wrote: > Try reinstalling blast, or upgrade to a newer version of blast. > > ?Carson > > > On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: > > > I have maker running with MPI and I get this error over and over again for > every contig. Any Ideas? > > > MAKER WARNING: All old files will be erased before continuing > > #--------------------------------------------------------------------- > > Now starting the contig!! > > SeqID: 5239 > > Length: 1395 > > #--------------------------------------------------------------------- > > > > Error: NCBI C++ Exception: > > "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Criti > > > > *This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged, > including patient health information. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution or use of the contents of this message is strictly prohibited. > If you have received this message in error or are not the named recipient, > please notify us immediately by contacting the sender at the electronic > mail address noted above, and delete and destroy all copies of this > message. Thank you.* > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 14 11:36:07 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Mar 2017 11:36:07 -0600 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: The error itself is coming from BLAST. MAKER does provide the command used, so you can try it outside of MAKER. You can submit the files used as well as command used to the BLAST developers for them to test with. MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you can stop it from deleting files. Edit line 58 by setting CLEANUP => 0 Then you should be able to grab whatever files maker used to run blast, and copy the blast command used from STDERR. ?Carson > On Mar 14, 2017, at 11:29 AM, Marcus Naymik wrote: > > I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: > > setting up GFF3 output and fasta chunks > > doing blastn of ESTs > > running blast search. > > #--------- command -------------# > > Widget::blastn: > > /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah > > #-------------------------------# > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr > > > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr > > > > examining contents of the fasta file and run log > > ERROR: BLASTN failed > > --> rank=87, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537645 > > > > ERROR: BLASTN failed > > --> rank=88, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537659 > > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537645 > > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537659 > > > > > On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt > wrote: > Try reinstalling blast, or upgrade to a newer version of blast. > > ?Carson > > >> On Mar 2, 2017, at 1:05 PM, Marcus Naymik > wrote: >> >> >> I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? >> >> >> >> MAKER WARNING: All old files will be erased before continuing >> >> #--------------------------------------------------------------------- >> >> Now starting the contig!! >> >> SeqID: 5239 >> >> Length: 1395 >> >> #--------------------------------------------------------------------- >> >> >> >> >> >> Error: NCBI C++ Exception: >> >> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti >> >> >> >> >> >> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Mar 14 20:27:10 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Mar 2017 22:27:10 -0400 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <2017030519265949065818@cau.edu.cn> References: <2017030519265949065818@cau.edu.cn> Message-ID: <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> Hi Chao Chao, I?ve not run into this before. Could you post the RepeatModeler command you used? Thanks, Mike > On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: > > Dear sir: > Before the maker opeations, I do repeat masking first on my contigs. > However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. > > When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : > NOTE: RepeatScout did not return any models. > > Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? > > Hope to hear from you. > Best wishes! > > Chao Chao > 2017.03.05 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcg at cau.edu.cn Wed Mar 15 08:26:15 2017 From: dcg at cau.edu.cn (dcg at cau.edu.cn) Date: Wed, 15 Mar 2017 22:26:15 +0800 Subject: [maker-devel] How to get Pseudogene Message-ID: <2017031522261575294011@cau.edu.cn> Dear sir: I'd like to mask some pseudogene to my annotation. How can I do it? In the guide, the first step is "Run a tblastn of the protein sequence (query) vs. the intergenic genome sequence (subject/database)" My question is: What do the " protein sequence and the intergenic genome sequence " refer to seperately? My own protein database? How to use the result in maker annotation? Best wishes! Chao Chao 2017.03.15 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Mar 15 09:00:13 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 15 Mar 2017 11:00:13 -0400 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <201703152048212561203@cau.edu.cn> References: <2017030519265949065818@cau.edu.cn> <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> <201703152048212561203@cau.edu.cn> Message-ID: <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> Hi Chao Chao, I?m not sure how to trouble shoot this if there were no error messages. I?ve ccd a couple of people that have worked with this protocol much more than I have. Ning and Kevin, Do you have any tips for running these tools that may help Chao Chao? Thanks, Mike > On Mar 15, 2017, at 8:48 AM, dcg at cau.edu.cn wrote: > > Thank for your reply! > I just followed the guide iat http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced > > To use LTRHarvest, my command is as below(the filename was set for my favor) > DIR1/gt suffixerator -db seqfile -indexname seqfileindex -tis -suf -lcp -des -ssp ?dna > DIR1/gt ltrharvest -index seqfileindex -out seqfile.out99 -outinner seqfile.outinner99 -gff3 seqfile.gff99 -minlenltr 100 \ > -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > seqfile.result99 > No error, but no results as well > > Chao Chao > 2017.03.15 > > From: Michael Campbell > Date: 2017-03-15 10:27 > To: dcg > CC: maker-devel > Subject: Re: [maker-devel] For help about masking repeats before annotation > Hi Chao Chao, > > I?ve not run into this before. Could you post the RepeatModeler command you used? > > Thanks, > Mike >> On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: >> >> Dear sir: >> Before the maker opeations, I do repeat masking first on my contigs. >> However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. >> >> When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : >> NOTE: RepeatScout did not return any models. >> >> Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? >> >> Hope to hear from you. >> Best wishes! >> >> Chao Chao >> 2017.03.05 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaymik at tgen.org Wed Mar 15 10:54:48 2017 From: mnaymik at tgen.org (Marcus Naymik) Date: Wed, 15 Mar 2017 09:54:48 -0700 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: Thanks, you're right. I had to recompile blast from src with this flag: -std=c++0x On Tue, Mar 14, 2017 at 10:36 AM, Carson Holt wrote: > The error itself is coming from BLAST. MAKER does provide the command > used, so you can try it outside of MAKER. You can submit the files used as > well as command used to the BLAST developers for them to test with. > > MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you > can stop it from deleting files. > > Edit line 58 by setting CLEANUP => 0 > > Then you should be able to grab whatever files maker used to run blast, > and copy the blast command used from STDERR. > > ?Carson > > > > On Mar 14, 2017, at 11:29 AM, Marcus Naymik wrote: > > I have now tried with multiple versions of blast (2.6 and 2.28 binaries > and built from source) and get the same error: > > setting up GFF3 output and fasta chunks > > doing blastn of ESTs > > running blast search. > > #--------- command -------------# > > Widget::blastn: > > /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db > /scratch/mnaymik/maker/tmp/maker_cah > > #-------------------------------# > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Cr > > > Error: NCBI C++ Exception: > > "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", > line 925: Cr > > > examining contents of the fasta file and run log > > ERROR: BLASTN failed > > --> rank=87, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537645 > > > ERROR: BLASTN failed > > --> rank=88, hostname=pnap-pe7-s09 > > ERROR: Failed while doing blastn of ESTs > > ERROR: Chunk failed at level:0, tier_type:3 > > FAILED CONTIG:6537659 > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537645 > > > ERROR: Chunk failed at level:4, tier_type:0 > > FAILED CONTIG:6537659 > > > > On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt wrote: > >> Try reinstalling blast, or upgrade to a newer version of blast. >> >> ?Carson >> >> >> On Mar 2, 2017, at 1:05 PM, Marcus Naymik wrote: >> >> >> I have maker running with MPI and I get this error over and over again >> for every contig. Any Ideas? >> >> >> MAKER WARNING: All old files will be erased before continuing >> >> #--------------------------------------------------------------------- >> >> Now starting the contig!! >> >> SeqID: 5239 >> >> Length: 1395 >> >> #--------------------------------------------------------------------- >> >> >> >> Error: NCBI C++ Exception: >> >> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", >> line 925: Criti >> >> >> >> *This electronic message is intended to be for the use only of the named >> recipient, and may contain information that is confidential or privileged, >> including patient health information. If you are not the intended >> recipient, you are hereby notified that any disclosure, copying, >> distribution or use of the contents of this message is strictly prohibited. >> If you have received this message in error or are not the named recipient, >> please notify us immediately by contacting the sender at the electronic >> mail address noted above, and delete and destroy all copies of this >> message. Thank you.* >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > *This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged, > including patient health information. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution or use of the contents of this message is strictly prohibited. > If you have received this message in error or are not the named recipient, > please notify us immediately by contacting the sender at the electronic > mail address noted above, and delete and destroy all copies of this > message. Thank you.* > > > -- *This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 15 11:00:18 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 15 Mar 2017 11:00:18 -0600 Subject: [maker-devel] ThrowNullPointerException() In-Reply-To: References: <37D5C48B-3BA7-4523-BD00-F884E1E0771E@gmail.com> Message-ID: <6A6C819F-D903-401A-8522-29FEBC955F17@gmail.com> Glad I could help. Remember to switch back CLEANUP => 1 if you set it to 0 to debug. Otherwise you will have a lot of files left in /tmp after each MAKER run. ?Carson > On Mar 15, 2017, at 10:54 AM, Marcus Naymik wrote: > > Thanks, you're right. I had to recompile blast from src with this flag: -std=c++0x > > On Tue, Mar 14, 2017 at 10:36 AM, Carson Holt > wrote: > The error itself is coming from BLAST. MAKER does provide the command used, so you can try it outside of MAKER. You can submit the files used as well as command used to the BLAST developers for them to test with. > > MAKER deletes files on failure, but if you edit the ?/maker/lib/GI.pm, you can stop it from deleting files. > > Edit line 58 by setting CLEANUP => 0 > > Then you should be able to grab whatever files maker used to run blast, and copy the blast command used from STDERR. > > ?Carson > > > >> On Mar 14, 2017, at 11:29 AM, Marcus Naymik > wrote: >> >> I have now tried with multiple versions of blast (2.6 and 2.28 binaries and built from source) and get the same error: >> >> setting up GFF3 output and fasta chunks >> >> doing blastn of ESTs >> >> running blast search. >> >> #--------- command -------------# >> >> Widget::blastn: >> >> /home/mnaymik/TOOLS/ncbi-blast-2.2.28+/bin/blastn -db /scratch/mnaymik/maker/tmp/maker_cah >> >> #-------------------------------# >> >> Error: NCBI C++ Exception: >> >> "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr >> >> >> >> Error: NCBI C++ Exception: >> >> "/home/mnaymik/TOOLS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Cr >> >> >> >> examining contents of the fasta file and run log >> >> ERROR: BLASTN failed >> >> --> rank=87, hostname=pnap-pe7-s09 >> >> ERROR: Failed while doing blastn of ESTs >> >> ERROR: Chunk failed at level:0, tier_type:3 >> >> FAILED CONTIG:6537645 >> >> >> >> ERROR: BLASTN failed >> >> --> rank=88, hostname=pnap-pe7-s09 >> >> ERROR: Failed while doing blastn of ESTs >> >> ERROR: Chunk failed at level:0, tier_type:3 >> >> FAILED CONTIG:6537659 >> >> >> >> ERROR: Chunk failed at level:4, tier_type:0 >> >> FAILED CONTIG:6537645 >> >> >> >> ERROR: Chunk failed at level:4, tier_type:0 >> >> FAILED CONTIG:6537659 >> >> >> >> >> On Thu, Mar 2, 2017 at 1:25 PM, Carson Holt > wrote: >> Try reinstalling blast, or upgrade to a newer version of blast. >> >> ?Carson >> >> >>> On Mar 2, 2017, at 1:05 PM, Marcus Naymik > wrote: >>> >>> >>> I have maker running with MPI and I get this error over and over again for every contig. Any Ideas? >>> >>> >>> >>> MAKER WARNING: All old files will be erased before continuing >>> >>> #--------------------------------------------------------------------- >>> >>> Now starting the contig!! >>> >>> SeqID: 5239 >>> >>> Length: 1395 >>> >>> #--------------------------------------------------------------------- >>> >>> >>> >>> >>> >>> Error: NCBI C++ Exception: >>> >>> "/packages/BUILDS/ncbi-blast-2.2.28+-src/c++/src/corelib/ncbiobj.cpp", line 925: Criti >>> >>> >>> >>> >>> >>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. >> > > > > This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangn at msu.edu Wed Mar 15 09:56:30 2017 From: jiangn at msu.edu (Jiang, Ning) Date: Wed, 15 Mar 2017 15:56:30 +0000 Subject: [maker-devel] For help about masking repeats before annotation In-Reply-To: <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> References: <2017030519265949065818@cau.edu.cn> <9457BA63-7277-478A-8BA7-A4F9296D850D@gmail.com> <201703152048212561203@cau.edu.cn>, <423545A6-83BC-44DA-934A-62603C3CEBC0@gmail.com> Message-ID: Hi Chao Chao, I guess you have an extra "\" in your second command. We put that sign there to indicate the entire thing belong to one command (it is too long to put in one row). I suggest you remove the "\" and try again. Good luck! Ning Jiang ________________________________ From: Michael Campbell Sent: Wednesday, March 15, 2017 11:00:13 AM To: dcg at cau.edu.cn Cc: maker-devel; Jiang, Ning; Kevin Childs Subject: Re: [maker-devel] For help about masking repeats before annotation Hi Chao Chao, I?m not sure how to trouble shoot this if there were no error messages. I?ve ccd a couple of people that have worked with this protocol much more than I have. Ning and Kevin, Do you have any tips for running these tools that may help Chao Chao? Thanks, Mike On Mar 15, 2017, at 8:48 AM, dcg at cau.edu.cn wrote: Thank for your reply! I just followed the guide iat http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced To use LTRHarvest, my command is as below(the filename was set for my favor) DIR1/gt suffixerator -db seqfile -indexname seqfileindex -tis -suf -lcp -des -ssp ?dna DIR1/gt ltrharvest -index seqfileindex -out seqfile.out99 -outinner seqfile.outinner99 -gff3 seqfile.gff99 -minlenltr 100 \ -maxlenltr 6000 -mindistltr 1500 -maxdistltr 25000 -mintsd 5 -maxtsd 5 -motif tgca -similar 99 -vic 10 > seqfile.result99 No error, but no results as well Chao Chao ________________________________ 2017.03.15 From: Michael Campbell Date: 2017-03-15 10:27 To: dcg CC: maker-devel Subject: Re: [maker-devel] For help about masking repeats before annotation Hi Chao Chao, I?ve not run into this before. Could you post the RepeatModeler command you used? Thanks, Mike On Mar 5, 2017, at 6:26 AM, dcg at cau.edu.cn wrote: Dear sir: Before the maker opeations, I do repeat masking first on my contigs. However , when I followed " Repeat Library Construction-Advanced ", no results generated after I running LTRharvest. So I couldn't do any further. When I attempted to follow" Repeat Library Construction-Basic " to run RepeatModeler, a note caused my attention even though RECON can return some results : NOTE: RepeatScout did not return any models. Is the situation above normal in masking progress? How can I deal with the problems to make a high-quality repeat library for my assemblied contigs? Hope to hear from you. Best wishes! Chao Chao ________________________________ 2017.03.05 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 16 09:19:02 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 Mar 2017 09:19:02 -0600 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> Message-ID: <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied. Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. So the final result is not a superset, rather a merged subset from each potential source. EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn?t work it produces either no model or partial models. ?Carson > On Mar 16, 2017, at 3:07 AM, Ray Cui wrote: > > Dear Carson, > > thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set? > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt > wrote: > Maybe. I haven?t tested this, but it should work. Maker supports labels for input by placing a ?:? and a label after each file name. > > Example?> > est=file1.fasta:label_1,file2.fasta:label_2 > > If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2. > > As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3?> > evmtrans:est2genome:label1=10 > > I don?t know if the label will force anything raw analysis to rerun, but it shouldn?t. > > > ?Carson > > > >> On Mar 15, 2017, at 5:13 AM, Ray Cui > wrote: >> >> Hi Carson, >> >> currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa? >> >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui > wrote: >> Dear Carson, >> >> thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see. >> >> It is good to know that MAKER will reuse the old results. >> >> Best Regards, >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt > wrote: >> You can find lots of info in the devel archives on training. Example ?> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >> >> Also example of training SNAP on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors >> >> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues. >> >> Thanks, >> Carson >> >> >>> On Mar 14, 2017, at 10:44 AM, Ray Cui > wrote: >>> >>> Hi Carson, >>> Thanks for your prompt response! >>> >>> I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? >>> After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui > wrote: >>> I see. If my evm config looks like this: >>> evmab=5 #default weight for source unspecified ab initio predictions >>> evmab:snap=5 #weight for snap sourced predictions >>> evmab:augustus=10 #weight for augustus sourced predictions >>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>> evmab:genemark=5 #weight for genemark sourced predictions >>> >>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct? >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt > wrote: >>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source. >>> >>> ?Carson >>> >>>> On Mar 14, 2017, at 10:26 AM, Ray Cui > wrote: >>>> >>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now. >>>> >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt > wrote: >>>> >>>> These are set in the maker_evm.ctl file. >>>> >>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this ?> >>>> evmab:GENEMARK=7 >>>> >>>> This also works ?> >>>> evmab:pred_gff:GENEMARK=7 >>>> >>>> Or just set the default ?> >>>> evmab=7 >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui > wrote: >>>>> >>>>> Dear Carson, >>>>> >>>>> I think it may be the most straight foward to input the GFF3 instead. >>>>> >>>>> What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option? >>>>> >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 >>>>> Mobile: +49 0221 37970 496 <> >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt > wrote: >>>>> It may work as is as long as you don?t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff. >>>>> >>>>> ?Carson >>>>> >>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui > wrote: >>>>>> >>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>> >>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>> >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 >>>>>> Mobile: +49 0221 37970 496 <> >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt > wrote: >>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts. >>>>>> >>>>>> ?Carson >>>>>> >>>>>> >>>>>> >>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui > wrote: >>>>>>> >>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>> >>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage. >>>>>>> >>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl ", with the following command lines options: >>>>>>> >>>>>>> Usage: /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] --sequence [filename] >>>>>>> >>>>>>> GeneMark-ES Suite version 4.33 >>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction >>>>>>> >>>>>>> Input sequence/s should be in FASTA format >>>>>>> >>>>>>> Algorithm options >>>>>>> --ES to run self-training >>>>>>> --fungus to run algorithm with branch point model (most useful for fungal genomes) >>>>>>> --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) >>>>>>> --et_score [number]; 4 (default) minimum score of intron in initiation of the ET algorithm >>>>>>> --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome >>>>>>> --training_only to run only training step >>>>>>> --prediction_only to run only prediction step >>>>>>> --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) >>>>>>> >>>>>>> Sequence pre-processing options >>>>>>> --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig >>>>>>> --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training >>>>>>> --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap >>>>>>> Letters 'n' and 'N' are interpreted as standing within gaps >>>>>>> --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask >>>>>>> Letters 'x' and 'X' are interpreted as results of hard masking of repeats >>>>>>> --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length >>>>>>> >>>>>>> Run options >>>>>>> --cores [number]; 1 (default) to run program with multiple threads >>>>>>> --pbs to run on cluster with PBS support >>>>>>> --v verbose >>>>>>> >>>>>>> Customizing parameters: >>>>>>> --max_intron [number]; default 10000 (3000 fungi), maximum length of intron >>>>>>> --max_intergenic [number]; default 10000, maximum length of intergenic regions >>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step >>>>>>> >>>>>>> Developer options: >>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>> --ini_mod [filename]; use this file with parameters for algorithm initiation >>>>>>> --test_set [filename]; to evaluate prediction accuracy on the given test set >>>>>>> --key_bin >>>>>>> --debug >>>>>>> # ------------------- >>>>>>> >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 >>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt > wrote: >>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui > wrote: >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark? >>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Ray >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 >>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt > wrote: >>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui > wrote: >>>>>>>>> >>>>>>>>> Dear Carson, >>>>>>>>> >>>>>>>>> I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the --prediction and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps) >>>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt > wrote: >>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter. >>>>>>>>> >>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don?t know how this compares to GeneMark-ET. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui > wrote: >>>>>>>>>> >>>>>>>>>> Hi Daniel, >>>>>>>>>> >>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? >>>>>>>>>> >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel > wrote: >>>>>>>>>> Hi Ray, >>>>>>>>>> >>>>>>>>>> I think you?re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path to the ?es.mod" file made by Genemark. >>>>>>>>>> >>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. >>>>>>>>>> >>>>>>>>>> ~Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui > wrote: >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor. >>>>>>>>>>> When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>> >>>>>>>>>>> And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcui at age.mpg.de Thu Mar 16 10:02:08 2017 From: rcui at age.mpg.de (Ray Cui) Date: Thu, 16 Mar 2017 17:02:08 +0100 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Message-ID: Dear Carson, thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster. Best Regards, Ray Dr. Rongfeng (Ray) Cui Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing Wissenschaftlicher MA / Postdoctoral researcher Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne Tel.:+49 (0)221 496 Mobile: +49 0221 37970 496 rcui at age.mpg.de www.age.mpg.de On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt wrote: > Final results with source maker will be of type gene/mRNA/exon/CDS. They > have been further processed beyond the raw results, and may include > extensions such as the addition of UTR for example (or hint based > recomputation in the case of SNAP and Augustus). The gene ID of the maker > model will let you know the source before additional processing was > applied. Raw results will also be in the file as type match/match_part and > source evm/snap/augustus, but are only there for reference purposes (there > will also be a raw fasta from each source, but only for reference > purposes). All models compete against each other, and the one best matching > the evidence is kept. So if SNAP or Augustus scores better than EVM, then > that model will be kept for that locus. You can find more detail in the > MAKER wiki and the MAKER2 paper for how models compete. > > So the final result is not a superset, rather a merged subset from each > potential source. > > EVM is not used to obtain a consensus gene model. Its results compete just > like all other algorithms. This is because when EVM works it produces > beautiful models that score really well, but when it doesn?t work it > produces either no model or partial models. > > ?Carson > > > On Mar 16, 2017, at 3:07 AM, Ray Cui wrote: > > Dear Carson, > > thank you so much! I am now peeking into the results for the > finished scaffolds. In the gff file, the gene id confuses me a bit. In this > file, column 2 is always "maker", but the "ID" attribute in the annotation > is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean > the final annotation is a superset of all gene predictors? If EVM was used > to obtain a consensus gene model, why would the other models still show up > in the final result set? > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for > Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 <+49%20221%20496> > Mobile: +49 0221 37970 496 > rcui at age.mpg.de > www.age.mpg.de > > > > On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt wrote: > >> Maybe. I haven?t tested this, but it should work. Maker supports labels >> for input by placing a ?:? and a label after each file name. >> >> Example?> >> est=file1.fasta:label_1,file2.fasta:label_2 >> >> If you label your files, then the label will go into the GFF3. So instead >> of est2genome in column 2, you will get est2genome:label_1 in column 2. >> >> As a result, you should be able to add that label to the EVM settings >> like so and it will match column 2 of the GFF3?> >> evmtrans:est2genome:label1=10 >> >> I don?t know if the label will force anything raw analysis to rerun, but >> it shouldn?t. >> >> >> ?Carson >> >> >> >> On Mar 15, 2017, at 5:13 AM, Ray Cui wrote: >> >> Hi Carson, >> >> currently I am partitioning the protein evidence based on >> phylogenetic relationship into several datasets, supplied as comma >> delimited list. Is it possible then to specify higher weight for >> protein2genome models from closer related species than further related taxa? >> >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >> Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 <+49%20221%20496> >> Mobile: +49 0221 37970 496 >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui wrote: >> >>> Dear Carson, >>> >>> thank you for the pointers! Before running the first round of >>> Maker, I mapped conspecific Trinity assembled proteins (long, "full length" >>> subset) to an earlier version of the genome assembly using my own pipeline >>> and trained Augustus and SNAP that way. I also trained Genemark-ET using >>> TopHat alignments per their instructions. I'm wondering if it will be worth >>> doing a second round, but I guess I will see. >>> >>> It is good to know that MAKER will reuse the old results. >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >>> Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 <+49%20221%20496> >>> Mobile: +49 0221 37970 496 >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt wrote: >>> >>>> You can find lots of info in the devel archives on training. Example ?> >>>> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >>>> >>>> Also example of training SNAP on the wiki ?> >>>> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/M >>>> AKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_init >>>> io_Gene_Predictors >>>> >>>> MAKER will reuse old raw results if you rerun in the same directory >>>> (only deleting what would be different given altered settings between >>>> runs). It will see the existing alignments archived in the datastore as raw >>>> reports and just reuse them. The exception to this are the exonerate >>>> alignments. They are generated relatively quickly compared to the BLAS T >>>> runs, so rerunning them is not too much overhead. Also they are not >>>> archived because doing so created IO issues (exonerate is not running in >>>> bulk batches like BLAST, rather as multiple small separate runs for each >>>> polished read, and archiving a lot of small raw reports can occur so fast >>>> when using MPI that it crashes storage servers). So we decided to just not >>>> archive exonerate rather than develop a database like bundling/compression >>>> mechanism to get around the IO issues. >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On Mar 14, 2017, at 10:44 AM, Ray Cui wrote: >>>> >>>> Hi Carson, >>>> Thanks for your prompt response! >>>> >>>> I have a somewhat unrelated question. After the first run of >>>> Maker, I want to train Augustus, SNAP and Genemark-ET using the most >>>> reliable gene models produced in the first round. What would be a good way >>>> to select these gene models? >>>> After retraining the ab initio predictors, I also wonder if >>>> it's necessary to redo all the alignments (blastx, est2genome, >>>> protein2genome etc) in the second iteration, since they are exactly the >>>> same as the first run. Perhaps maker can take in the alignment results from >>>> the previous run? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for >>>> Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>> Mobile: +49 0221 37970 496 >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui wrote: >>>> >>>>> I see. If my evm config looks like this: >>>>> evmab=5 #default weight for source unspecified ab initio predictions >>>>> evmab:snap=5 #weight for snap sourced predictions >>>>> evmab:augustus=10 #weight for augustus sourced predictions >>>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>>>> evmab:genemark=5 #weight for genemark sourced predictions >>>>> >>>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value >>>>> from "evmab" (=5) will be used, is that correct? >>>>> >>>>> Best Regards, >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>> for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>> Mobile: +49 0221 37970 496 >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt >>>>> wrote: >>>>> >>>>>> Column 2 in the GFF3 file is the source column. It is used to specify >>>>>> the source fo the data. That column will also be used by EVM to bin >>>>>> features by their source and apply weights based on source. >>>>>> >>>>>> ?Carson >>>>>> >>>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui wrote: >>>>>> >>>>>> Thanks! I didn't know you can also name the gff, but I think using >>>>>> the default is fine, that's what I'm doing now. >>>>>> >>>>>> >>>>>> Best Regards, >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>> for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>> Mobile: +49 0221 37970 496 >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> These are set in the maker_evm.ctl file. >>>>>>> >>>>>>> Use whatever you used in the source column of the input GFF3. For >>>>>>> example if column 2 is set as GENEMARK, then do this ?> >>>>>>> evmab:GENEMARK=7 >>>>>>> >>>>>>> This also works ?> >>>>>>> evmab:pred_gff:GENEMARK=7 >>>>>>> >>>>>>> Or just set the default ?> >>>>>>> evmab=7 >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui wrote: >>>>>>> >>>>>>> Dear Carson, >>>>>>> >>>>>>> I think it may be the most straight foward to input the GFF3 >>>>>>> instead. >>>>>>> >>>>>>> What is the correct way of setting a weight for the EVM step >>>>>>> for this GFF3 models passed through the pred_gff option? >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>>> for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>> Mobile: +49 0221 37970 496 >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt >>>>>>> wrote: >>>>>>> >>>>>>>> It may work as is as long as you don?t need any of the additional >>>>>>>> options that have been added. If not, you can also just run it outside of >>>>>>>> MAKER then provide the result in GFF3 format to pred_gff. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui wrote: >>>>>>>> >>>>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>>>> >>>>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>>>> >>>>>>>> Ray >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute >>>>>>>> for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET >>>>>>>>> and GeneMark-EP into GeneMark-ES scripts. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui wrote: >>>>>>>>> >>>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>>>> >>>>>>>>> I think there must have been a big update between different >>>>>>>>> Genemark versions. It seems that they now also supports evidence being fed >>>>>>>>> into the prediction stage. >>>>>>>>> >>>>>>>>> The name of the latest version of the genemark script has been >>>>>>>>> changed to "gmes_petap.pl", with the following command lines >>>>>>>>> options: >>>>>>>>> >>>>>>>>> Usage: /beegfs/group_dv/software/sou >>>>>>>>> rce/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] >>>>>>>>> --sequence [filename] >>>>>>>>> >>>>>>>>> GeneMark-ES Suite version 4.33 >>>>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) >>>>>>>>> based training and prediction >>>>>>>>> >>>>>>>>> Input sequence/s should be in FASTA format >>>>>>>>> >>>>>>>>> Algorithm options >>>>>>>>> --ES to run self-training >>>>>>>>> --fungus to run algorithm with branch point model (most >>>>>>>>> useful for fungal genomes) >>>>>>>>> --ET [filename]; to run training with introns >>>>>>>>> coordinates from RNA-Seq read alignments (GFF format) >>>>>>>>> --et_score [number]; 4 (default) minimum score of intron in >>>>>>>>> initiation of the ET algorithm >>>>>>>>> --evidence [filename]; to use in prediction external >>>>>>>>> evidence (RNA or protein) mapped to genome >>>>>>>>> --training_only to run only training step >>>>>>>>> --prediction_only to run only prediction step >>>>>>>>> --predict_with [filename]; predict genes using this file species >>>>>>>>> specific parameters (bypass regular training and prediction steps) >>>>>>>>> >>>>>>>>> Sequence pre-processing options >>>>>>>>> --max_contig [number]; 5000000 (default) will split input >>>>>>>>> genomic sequence into contigs shorter then max_contig >>>>>>>>> --min_contig [number]; 50000 (default); will ignore contigs >>>>>>>>> shorter then min_contig in training >>>>>>>>> --max_gap [number]; 5000 (default); will split sequence at >>>>>>>>> gaps longer than max_gap >>>>>>>>> Letters 'n' and 'N' are interpreted as standing >>>>>>>>> within gaps >>>>>>>>> --max_mask [number]; 5000 (default); will split sequence at >>>>>>>>> repeats longer then max_mask >>>>>>>>> Letters 'x' and 'X' are interpreted as results of >>>>>>>>> hard masking of repeats >>>>>>>>> --soft_mask [number] to indicate that lowercase letters stand >>>>>>>>> for repeats; utilize only lowercase repeats longer than specified length >>>>>>>>> >>>>>>>>> Run options >>>>>>>>> --cores [number]; 1 (default) to run program with >>>>>>>>> multiple threads >>>>>>>>> --pbs to run on cluster with PBS support >>>>>>>>> --v verbose >>>>>>>>> >>>>>>>>> Customizing parameters: >>>>>>>>> --max_intron [number]; default 10000 (3000 fungi), >>>>>>>>> maximum length of intron >>>>>>>>> --max_intergenic [number]; default 10000, maximum length of >>>>>>>>> intergenic regions >>>>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum >>>>>>>>> allowed gene length in prediction step >>>>>>>>> >>>>>>>>> Developer options: >>>>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>>>> --ini_mod [filename]; use this file with parameters for >>>>>>>>> algorithm initiation >>>>>>>>> --test_set [filename]; to evaluate prediction accuracy on >>>>>>>>> the given test set >>>>>>>>> --key_bin >>>>>>>>> --debug >>>>>>>>> # ------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>> Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Also note that the gmhmme3 executable distributed with different >>>>>>>>>> flavors of genemark has had the same name but has been quite different in >>>>>>>>>> both command line structure and output between flavors. >>>>>>>>>> >>>>>>>>>> ?Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui wrote: >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters >>>>>>>>>> automatically set by Maker when calling Genemark? >>>>>>>>>> If you can point me to the part of the maker source code that >>>>>>>>>> construct the final genemark command line I can also take a look. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt >>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. >>>>>>>>>>> It depends on if formatting or any flags have changed between versions. >>>>>>>>>>> >>>>>>>>>>> ?Carson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui wrote: >>>>>>>>>>> >>>>>>>>>>> Dear Carson, >>>>>>>>>>> >>>>>>>>>>> I have now run GeneMark-ET, and it produces a trained >>>>>>>>>>> .mod file. I think it can be then passed to Maker. Do you know what is the >>>>>>>>>>> final constructed command line in Maker that calls genemark? Genemark-et >>>>>>>>>>> and es use the same perl script so one probably only needs to use the >>>>>>>>>>> --prediction and --predict_with xxx.mod options to predict genes using >>>>>>>>>>> the species specific parameters (bypassing regular training and prediction >>>>>>>>>>> steps) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may >>>>>>>>>>>> not work with GeneMark-ET. So any MAKER related archive posts etc. will be >>>>>>>>>>>> related to the latter. >>>>>>>>>>>> >>>>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let >>>>>>>>>>>> it run. It would then produce several files and output directories. The >>>>>>>>>>>> es.mod file was the one you provided to MAKER. I don?t know how this >>>>>>>>>>>> compares to GeneMark-ET. >>>>>>>>>>>> >>>>>>>>>>>> ?Carson >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Daniel, >>>>>>>>>>>> >>>>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" >>>>>>>>>>>> flag, is that the flag I should use when training or should I just let >>>>>>>>>>>> Genemark also perform the prediction? >>>>>>>>>>>> >>>>>>>>>>>> Ray >>>>>>>>>>>> >>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ray, >>>>>>>>>>>>> >>>>>>>>>>>>> I think you?re on the right track with training Genemark with >>>>>>>>>>>>> RNAseq data. It should only change the training steps, which are external >>>>>>>>>>>>> to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path >>>>>>>>>>>>> to the ?es.mod" file made by Genemark. >>>>>>>>>>>>> >>>>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a >>>>>>>>>>>>> control file for EVM, in which you set your weights for the various inputs, >>>>>>>>>>>>> and then MAKER runs EVM alongside all the other gene predictors and chooses >>>>>>>>>>>>> the model that is best supported by the evidence. >>>>>>>>>>>>> >>>>>>>>>>>>> ~Daniel >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> I have sucessfully installed Maker beta 3, working >>>>>>>>>>>>> with both Augustus and SNAP. I also want to try adding GeneMark-ES to the >>>>>>>>>>>>> ab initio predictor. >>>>>>>>>>>>> When I read the GeneMark-ES manual, it says that one >>>>>>>>>>>>> can use RNAseq data to aid training. I'm wondering what would be the best >>>>>>>>>>>>> way to integrate Genemark-ET predictions into Maker. Should I run >>>>>>>>>>>>> Genemark-ET independent of Maker, then integrate the GFF at some point >>>>>>>>>>>>> during the maker process? If so, how should I edit the configuration file? >>>>>>>>>>>>> Currently maker has an option called "gmhmm". Should I then train GeneMark >>>>>>>>>>>>> by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>>>> >>>>>>>>>>>>> And perhaps an unrelated question is that now Maker >>>>>>>>>>>>> beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, >>>>>>>>>>>>> what does it do), and how does it differ from what Maker is designed for >>>>>>>>>>>>> (both reconciles different gene models). >>>>>>>>>>>>> >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Ray >>>>>>>>>>>>> >>>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck >>>>>>>>>>>>> Institute for Biology of Ageing >>>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>>> Tel.:+49 (0)221 496 <+49%20221%20496> >>>>>>>>>>>>> Mobile: +49 0221 37970 496 >>>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>>>>>>>>>>>> ell-lab.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand >>>>>>>>>>>> ell-lab.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 16 11:30:16 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 Mar 2017 11:30:16 -0600 Subject: [maker-devel] Using GeneMark-ET with RNAseq intron hints In-Reply-To: References: <2A8AEAD2-D9C9-4F96-8A6C-A11B55FA0F26@mail.ufl.edu> <52CD5438-F990-4D5E-AED1-7E86101DE3B5@gmail.com> <262A4EFA-B165-4B6C-8518-93F325E1D222@gmail.com> <5BF01882-6E2D-4202-A34A-8363406AEF9C@gmail.com> <1C6959D2-5A47-486C-B552-39333509F56A@gmail.com> <1D07560D-76DA-4CE0-ABE7-F3B7BDCC8614@gmail.com> <2D061BF0-C031-469A-86BF-5A181CDE19FB@gmail.com> Message-ID: 1. Verify that the issue is not being caused by hints from evidence (i.e. that you aren?t feeding fused mRNA-seq assemblies or protein evidence). Fused evidence will result in hints that fuse models. 2. If it still have an issue, then drop SNAP. Not all predictors work well on all genomes. Also no one can post to the google group. It?s just for archival. All message have to go to the mailing list here, and they then get archived on google ?> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org The mailing list logs shows that you requested to unsubscribed earlier today. ?Carson > On Mar 16, 2017, at 11:22 AM, Ray Cui wrote: > > Hi Carson, > > due to some reason I can't seem to post anymore on the google group. > > After looking at the results, it appears that SNAP performs poorly compared to genemark-ET and augustus. It looks like it's very prone to fusing neighboring genes and getting false positives. Is that a general thing you see in vertebrate genomes with SNAP? I saw that you didn't recommend SNAP for primates, perhaps the issue is similar? > > Attached you can see a screen shot of IGV browser, with all evidence tracks separated. > > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Thu, Mar 16, 2017 at 5:02 PM, Ray Cui > wrote: > Dear Carson, > > thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster. > > Best Regards, > Ray > > Dr. Rongfeng (Ray) Cui > Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing > Wissenschaftlicher MA / Postdoctoral researcher > Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne > Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne > Tel.:+49 (0)221 496 > Mobile: +49 0221 37970 496 <> > rcui at age.mpg.de > www.age.mpg.de > > > > On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt > wrote: > Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied. Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. > > So the final result is not a superset, rather a merged subset from each potential source. > > EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn?t work it produces either no model or partial models. > > ?Carson > > >> On Mar 16, 2017, at 3:07 AM, Ray Cui > wrote: >> >> Dear Carson, >> >> thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set? >> >> Best Regards, >> Ray >> >> Dr. Rongfeng (Ray) Cui >> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >> Wissenschaftlicher MA / Postdoctoral researcher >> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >> Tel.:+49 (0)221 496 >> Mobile: +49 0221 37970 496 <> >> rcui at age.mpg.de >> www.age.mpg.de >> >> >> >> On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt > wrote: >> Maybe. I haven?t tested this, but it should work. Maker supports labels for input by placing a ?:? and a label after each file name. >> >> Example?> >> est=file1.fasta:label_1,file2.fasta:label_2 >> >> If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2. >> >> As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3?> >> evmtrans:est2genome:label1=10 >> >> I don?t know if the label will force anything raw analysis to rerun, but it shouldn?t. >> >> >> ?Carson >> >> >> >>> On Mar 15, 2017, at 5:13 AM, Ray Cui > wrote: >>> >>> Hi Carson, >>> >>> currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa? >>> >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui > wrote: >>> Dear Carson, >>> >>> thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see. >>> >>> It is good to know that MAKER will reuse the old results. >>> >>> Best Regards, >>> Ray >>> >>> Dr. Rongfeng (Ray) Cui >>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>> Wissenschaftlicher MA / Postdoctoral researcher >>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>> Tel.:+49 (0)221 496 >>> Mobile: +49 0221 37970 496 <> >>> rcui at age.mpg.de >>> www.age.mpg.de >>> >>> >>> >>> On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt > wrote: >>> You can find lots of info in the devel archives on training. Example ?> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI >>> >>> Also example of training SNAP on the wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors >>> >>> MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues. >>> >>> Thanks, >>> Carson >>> >>> >>>> On Mar 14, 2017, at 10:44 AM, Ray Cui > wrote: >>>> >>>> Hi Carson, >>>> Thanks for your prompt response! >>>> >>>> I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? >>>> After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui > wrote: >>>> I see. If my evm config looks like this: >>>> evmab=5 #default weight for source unspecified ab initio predictions >>>> evmab:snap=5 #weight for snap sourced predictions >>>> evmab:augustus=10 #weight for augustus sourced predictions >>>> evmab:fgenesh=10 #weight for fgenesh sourced predictions >>>> evmab:genemark=5 #weight for genemark sourced predictions >>>> >>>> and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct? >>>> >>>> Best Regards, >>>> Ray >>>> >>>> Dr. Rongfeng (Ray) Cui >>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>> Wissenschaftlicher MA / Postdoctoral researcher >>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>> Tel.:+49 (0)221 496 >>>> Mobile: +49 0221 37970 496 <> >>>> rcui at age.mpg.de >>>> www.age.mpg.de >>>> >>>> >>>> >>>> On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt > wrote: >>>> Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source. >>>> >>>> ?Carson >>>> >>>>> On Mar 14, 2017, at 10:26 AM, Ray Cui > wrote: >>>>> >>>>> Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now. >>>>> >>>>> >>>>> Best Regards, >>>>> Ray >>>>> >>>>> Dr. Rongfeng (Ray) Cui >>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>> Tel.:+49 (0)221 496 >>>>> Mobile: +49 0221 37970 496 <> >>>>> rcui at age.mpg.de >>>>> www.age.mpg.de >>>>> >>>>> >>>>> >>>>> On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt > wrote: >>>>> >>>>> These are set in the maker_evm.ctl file. >>>>> >>>>> Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this ?> >>>>> evmab:GENEMARK=7 >>>>> >>>>> This also works ?> >>>>> evmab:pred_gff:GENEMARK=7 >>>>> >>>>> Or just set the default ?> >>>>> evmab=7 >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>>> On Mar 10, 2017, at 8:48 AM, Ray Cui > wrote: >>>>>> >>>>>> Dear Carson, >>>>>> >>>>>> I think it may be the most straight foward to input the GFF3 instead. >>>>>> >>>>>> What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option? >>>>>> >>>>>> Ray >>>>>> >>>>>> Dr. Rongfeng (Ray) Cui >>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>> Tel.:+49 (0)221 496 >>>>>> Mobile: +49 0221 37970 496 <> >>>>>> rcui at age.mpg.de >>>>>> www.age.mpg.de >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt > wrote: >>>>>> It may work as is as long as you don?t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff. >>>>>> >>>>>> ?Carson >>>>>> >>>>>>> On Feb 20, 2017, at 2:51 AM, Ray Cui > wrote: >>>>>>> >>>>>>> I see. Is there any recent plans to incorporate it into Maker? >>>>>>> >>>>>>> If not, I could try to see if I can adapt the current Maker script. >>>>>>> >>>>>>> Ray >>>>>>> >>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>> Tel.:+49 (0)221 496 >>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>> rcui at age.mpg.de >>>>>>> www.age.mpg.de >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt > wrote: >>>>>>> Yes. This is a recent update. It?s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Feb 20, 2017, at 2:43 AM, Ray Cui > wrote: >>>>>>>> >>>>>>>> I see, I will take a look at the wrapper gmhmm_wrap. >>>>>>>> >>>>>>>> I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage. >>>>>>>> >>>>>>>> The name of the latest version of the genemark script has been changed to "gmes_petap.pl ", with the following command lines options: >>>>>>>> >>>>>>>> Usage: /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl [options] --sequence [filename] >>>>>>>> >>>>>>>> GeneMark-ES Suite version 4.33 >>>>>>>> includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction >>>>>>>> >>>>>>>> Input sequence/s should be in FASTA format >>>>>>>> >>>>>>>> Algorithm options >>>>>>>> --ES to run self-training >>>>>>>> --fungus to run algorithm with branch point model (most useful for fungal genomes) >>>>>>>> --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) >>>>>>>> --et_score [number]; 4 (default) minimum score of intron in initiation of the ET algorithm >>>>>>>> --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome >>>>>>>> --training_only to run only training step >>>>>>>> --prediction_only to run only prediction step >>>>>>>> --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) >>>>>>>> >>>>>>>> Sequence pre-processing options >>>>>>>> --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig >>>>>>>> --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training >>>>>>>> --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap >>>>>>>> Letters 'n' and 'N' are interpreted as standing within gaps >>>>>>>> --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask >>>>>>>> Letters 'x' and 'X' are interpreted as results of hard masking of repeats >>>>>>>> --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length >>>>>>>> >>>>>>>> Run options >>>>>>>> --cores [number]; 1 (default) to run program with multiple threads >>>>>>>> --pbs to run on cluster with PBS support >>>>>>>> --v verbose >>>>>>>> >>>>>>>> Customizing parameters: >>>>>>>> --max_intron [number]; default 10000 (3000 fungi), maximum length of intron >>>>>>>> --max_intergenic [number]; default 10000, maximum length of intergenic regions >>>>>>>> --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step >>>>>>>> >>>>>>>> Developer options: >>>>>>>> --usr_cfg [filename]; to customize configuration file >>>>>>>> --ini_mod [filename]; use this file with parameters for algorithm initiation >>>>>>>> --test_set [filename]; to evaluate prediction accuracy on the given test set >>>>>>>> --key_bin >>>>>>>> --debug >>>>>>>> # ------------------- >>>>>>>> >>>>>>>> >>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>> Tel.:+49 (0)221 496 >>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>> rcui at age.mpg.de >>>>>>>> www.age.mpg.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt > wrote: >>>>>>>> Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 20, 2017, at 2:08 AM, Ray Cui > wrote: >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark? >>>>>>>>> If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>> rcui at age.mpg.de >>>>>>>>> www.age.mpg.de >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt > wrote: >>>>>>>>> The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions. >>>>>>>>> >>>>>>>>> ?Carson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 1:59 AM, Ray Cui > wrote: >>>>>>>>>> >>>>>>>>>> Dear Carson, >>>>>>>>>> >>>>>>>>>> I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the --prediction and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ray >>>>>>>>>> >>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>> rcui at age.mpg.de >>>>>>>>>> www.age.mpg.de >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt > wrote: >>>>>>>>>> MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter. >>>>>>>>>> >>>>>>>>>> With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don?t know how this compares to GeneMark-ET. >>>>>>>>>> >>>>>>>>>> ?Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 14, 2017, at 8:44 AM, Ray Cui > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Daniel, >>>>>>>>>>> >>>>>>>>>>> thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? >>>>>>>>>>> >>>>>>>>>>> Ray >>>>>>>>>>> >>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>> www.age.mpg.de >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel > wrote: >>>>>>>>>>> Hi Ray, >>>>>>>>>>> >>>>>>>>>>> I think you?re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You?ll still give MAKER the path to the ?es.mod" file made by Genemark. >>>>>>>>>>> >>>>>>>>>>> For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. >>>>>>>>>>> >>>>>>>>>>> ~Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Feb 14, 2017, at 7:38 AM, Ray Cui > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor. >>>>>>>>>>>> When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker? >>>>>>>>>>>> >>>>>>>>>>>> And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Ray >>>>>>>>>>>> >>>>>>>>>>>> Dr. Rongfeng (Ray) Cui >>>>>>>>>>>> Max-Planck-Institut f?r Biologie des Alterns / Max Planck Institute for Biology of Ageing >>>>>>>>>>>> Wissenschaftlicher MA / Postdoctoral researcher >>>>>>>>>>>> Office: Joseph-Stelzmann 9b, D-50931 K?ln / Cologne >>>>>>>>>>>> Postal address: Postfach 41 06 23, D-50866 K?ln / Cologne >>>>>>>>>>>> Tel.:+49 (0)221 496 >>>>>>>>>>>> Mobile: +49 0221 37970 496 <> >>>>>>>>>>>> rcui at age.mpg.de >>>>>>>>>>>> www.age.mpg.de >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Mar 16 21:48:10 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Mar 2017 23:48:10 -0400 Subject: [maker-devel] split genes Message-ID: Hello: If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Mar 17 09:21:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Mar 2017 09:21:10 -0600 Subject: [maker-devel] split genes In-Reply-To: References: Message-ID: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. ?Carson > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang wrote: > > Hello: > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Fri Mar 17 11:49:06 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 17 Mar 2017 13:49:06 -0400 Subject: [maker-devel] split genes In-Reply-To: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. Best Quanwei 2017-03-17 11:21 GMT-04:00 Carson Holt : > MAKER will not try and predict a gene across contigs because it it too > difficult to determine contig order. If you are able to determine order, > then it is best to merge the contigs into a single scaffold before > annotating rather than try and produce split models in GFF3. > > ?Carson > > > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > If one gene was covered by two contigs, sometimes we may predicted two > genes. I wonder how Maker deal with such conditions? > > Even Maker tried to reduce such cases, they can not be completely avoid. > So I wonder whether there is any way or any tool to find such split genes > (one gene split into two contigs and predicted as two genes)? > > > > As we know, we can also provide protein sequences and transcript > assembly as evidences. Can a protein sequence or transcript assembly rescue > the split genes in Maker pipe line? For example, if one transcript cover > 40% of predicted genes predicted in two contigs, then merge the predicted > genes into one? > > > > Thanks > > > > Best > > Quanwei > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Fri Mar 17 16:37:16 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 17 Mar 2017 18:37:16 -0400 Subject: [maker-devel] putative gene function by mapping to UniProt/Swiss-prot set Message-ID: Hello: I have a questions about the assigning putative gene function by mapping to UniProt/Swiss-prot gene set (described in the protocol published in 2014). Here, for each of the gene model from Maker, the pipeline will find the most similar protein in UniProt/Swiss-prot and assign the function of the matched protein, right? It does not require best-reciprocal hit, right? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Mar 20 07:03:10 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 20 Mar 2017 09:03:10 -0400 Subject: [maker-devel] putative gene function by mapping to UniProt/Swiss-prot set In-Reply-To: References: Message-ID: Hi Quanwei, Correct. Just the best hit when blasting the MAKER generated fasta sequences to Swiss-prot. Thanks, Mike > On Mar 17, 2017, at 6:37 PM, Quanwei Zhang wrote: > > Hello: > > I have a questions about the assigning putative gene function by mapping to UniProt/Swiss-prot gene set (described in the protocol published in 2014). > Here, for each of the gene model from Maker, the pipeline will find the most similar protein in UniProt/Swiss-prot and assign the function of the matched protein, right? > It does not require best-reciprocal hit, right? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Mon Mar 20 11:09:28 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Mon, 20 Mar 2017 13:09:28 -0400 Subject: [maker-devel] evidence of transcript assembly Message-ID: Hello: I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? (1) All the Trinity transcript assemblies? (2) Trinity transcript assemblies that fully cover the mouse genes? (3) Trinity transcript assemblies either fully or partly cover the mouse genes? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Mon Mar 20 11:09:28 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Mon, 20 Mar 2017 13:09:28 -0400 Subject: [maker-devel] evidence of transcript assembly Message-ID: Hello: I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? (1) All the Trinity transcript assemblies? (2) Trinity transcript assemblies that fully cover the mouse genes? (3) Trinity transcript assemblies either fully or partly cover the mouse genes? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenna.kramer at utoronto.ca Mon Mar 20 19:37:45 2017 From: glenna.kramer at utoronto.ca (Glenna Kramer) Date: Tue, 21 Mar 2017 01:37:45 +0000 Subject: [maker-devel] GFF no longer valid after renaming genes Message-ID: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Hi there, I am hoping that you can give me some assistance with finishing up my maker annotated genome for submission. I have been able to rename the genes for GenBank submission - using Support Protocol 2 in the paper by Campbell et. al "Genome Annotation and Curation Using MAKER and MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39. (PMC4286374). I have also been able to use the Support Protocol 3 from that same paper to assign a putative gene function. However, I am running into problems when I am trying to convert the GFF file to the tbl format for submission. I have tried to use scripts from GAG (Genome Annotation Generator) and maker (gff32table). Both of these scripts work wonderfully on the gff originally output from maker, but do not work once I rename the genes for GenBank submission. When I feed my file into a gff validator it turns out that my gff is valid prior to renaming, but after I rename the gff is no longer valid. I have been trying to troubleshoot what is happening to my gff when I rename as in Support Protocol 2, but am stumped. Has anyone else out there had a similar issue? I would be very thankful for any insight that you can provide! Best, Glenna Not sure if this will be helpful, but here is an example gene from prior to renaming: ##gff-version 3 ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 And after renaming: ##gff-version 3 ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; The commands I used were: % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map %map_gff_ids myfilename.map myfilename.gff -------------- next part -------------- An HTML attachment was scrubbed... URL: From adf at ncgr.org Mon Mar 20 19:49:22 2017 From: adf at ncgr.org (Andrew Farmer) Date: Mon, 20 Mar 2017 19:49:22 -0600 Subject: [maker-devel] GFF no longer valid after renaming genes In-Reply-To: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> References: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Message-ID: <127be156-b2bd-574f-5187-9942f05220e2@ncgr.org> Hi Glenna- this may be totally off-base but I have a vague memory that some validators will complain about the semicolon after the last attribute in the column nine attribute list; it's not clear to me from the specification that this is truly illegal, but can imagine why a parser might not like to deal with it. In any case, you might try just removing that terminal semicolon character and see if that solves the validation complaint. but apologies in advance if my dim recollection has misled me into wasting your time... Andrew Farmer On 3/20/17 7:37 PM, Glenna Kramer wrote: > Hi there, > > I am hoping that you can give me some assistance with finishing up my > maker annotated genome for submission. I have been able to rename the > genes for GenBank submission - using Support Protocol 2 in the paper > by Campbell et. al "Genome Annotation and Curation Using MAKER and > MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39. > (PMC4286374). > I have also been able to use the Support Protocol 3 from that same > paper to assign a putative gene function. However, I am running into > problems when I am trying to convert the GFF file to the tbl format > for submission. I have tried to use scripts from GAG (Genome > Annotation Generator) and maker (gff32table). Both of these scripts > work wonderfully on the gff originally output from maker, but do not > work once I rename the genes for GenBank submission. When I feed my > file into a gff validator it turns out that my gff is valid prior to > renaming, but after I rename the gff is no longer valid. I have been > trying to troubleshoot what is happening to my gff when I rename as in > Support Protocol 2, but am stumped. Has anyone else out there had a > similar issue? I would be very thankful for any insight that you can > provide! > > Best, > Glenna > > Not sure if this will be helpful, but here is an example gene from > prior to renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 > ChromoV|quiver|quiver maker exon 62081 62650 . + . > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 > ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > > And after renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . > ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . > ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; > ChromoV|quiver|quiver maker exon 62081 62650 . + . > ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 > ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > > The commands I used were: > > % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map > > %map_gff_ids myfilename.map myfilename.gff > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- ...all concepts in which an entire process is semiotically concentrated elude definition; only that which has no history is definable. Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 10:15:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 10:15:20 -0600 Subject: [maker-devel] GFF no longer valid after renaming genes In-Reply-To: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> References: <4781C7F0FC2DAA4BBC18FC44DC9D09AEFAB2016B@ArborExMBx4P.UTORARBOR.UTORAD.Utoronto.ca> Message-ID: <5DFD02E2-2C6F-49DA-90DE-9E17EE0A8CE2@gmail.com> The problem appears to be the multiple ?|? characters in your contig names (ChromoV|quiver|quiver). They end up in the gene ID, and since ?|? has a special meaning in perl, it creates weird replacement behavior. I?ve attached two scripts that will fix that. Use them to replace their counterparts in the ?/maker/bin/ and .../maker/src/bin/ directories, then rerun all renaming steps on a new gff3 (not the one you already tried to rename). Also you may want to consider changing IDs in the assembly itself before you release it or use it for analysis. You would want to remove the '|quiver|quiver? tail on every contig. That tail has the potential to open up hidden downstream analysis errors from other tools for the same reasons outlined above, since ?|? characters have special meaning. Thanks, Carson > On Mar 20, 2017, at 7:37 PM, Glenna Kramer wrote: > > Hi there, > > I am hoping that you can give me some assistance with finishing up my maker annotated genome for submission. I have been able to rename the genes for GenBank submission - using Support Protocol 2 in the paper by Campbell et. al "Genome Annotation and Curation Using MAKER and MAKER-P" Curr Protoc Bioinformatics. 2014; 48: 4.11.1?4.11.39.? (PMC4286374). I have also been able to use the Support Protocol 3 from that same paper to assign a putative gene function. However, I am running into problems when I am trying to convert the GFF file to the tbl format for submission. I have tried to use scripts from GAG (Genome Annotation Generator) and maker (gff32table). Both of these scripts work wonderfully on the gff originally output from maker, but do not work once I rename the genes for GenBank submission. When I feed my file into a gff validator it turns out that my gff is valid prior to renaming, but after I rename the gff is no longer valid. I have been trying to troubleshoot what is happening to my gff when I rename as in Support Protocol 2, but am stumped. Has anyone else out there had a similar issue? I would be very thankful for any insight that you can provide! > > Best, > Glenna > > Not sure if this will be helpful, but here is an example gene from prior to renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9 > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9;Name=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_eAED=0.00;_QI=0|-1|0|1|-1|1|1|0|189 > ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1 > > And after renaming: > > ##gff-version 3 > ChromoV|quiver|quiver maker gene 62081 62650 . + . ID=A9K44_2555|quiver|quiver-processed-gene-0.9;Name=A9K55_2555|quiver|quiver-processed-gene-0.9;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9; > ChromoV|quiver|quiver maker mRNA 62081 62650 . + . ID=A9K44_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Parent=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9;Name=A9K55_2555|A9K55_2555-RA|quiver-processed-gene-0.9-mRNA-1;Alias=augustus_masked-ChromoV|quiver|quiver-processed-gene-0.9-mRNA-1;_AED=0.00;_QI=0|-1|0|1|-1|1|1|0|189;_eAED=0.00; > ChromoV|quiver|quiver maker exon 62081 62650 . + . ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:exon:11978;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > ChromoV|quiver|quiver maker CDS 62081 62650 . + 0 ID=A9K44_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1:cds;Parent=A9K55_2555-RA|quiver|quiver-processed-gene-0.9-mRNA-1; > > The commands I used were: > > % maker_map_ids --prefix_A9K44_ --justify 4 myfilename.gff>myfilename.map > > %map_gff_ids myfilename.map myfilename.gff > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map_fasta_ids Type: application/octet-stream Size: 1676 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map_gff_ids Type: application/octet-stream Size: 5048 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 11:00:06 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 11:00:06 -0600 Subject: [maker-devel] split genes In-Reply-To: References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: I have no suggestions, but maybe someone else on the list may have some. ?Carson > On Mar 17, 2017, at 11:49 AM, Quanwei Zhang wrote: > > Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. > > Best > Quanwei > > 2017-03-17 11:21 GMT-04:00 Carson Holt >: > MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. > > ?Carson > > > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > > > Thanks > > > > Best > > Quanwei > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 21 11:01:30 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Mar 2017 11:01:30 -0600 Subject: [maker-devel] evidence of transcript assembly In-Reply-To: References: Message-ID: <297B9C95-919E-4D4F-9103-1FED1550B745@gmail.com> Different sources of data will have different levels of quality. You may want to run them all, then look at results in a browser like Apollo. If specific source look like they are more problematic than others, then drop them. ?Carson > On Mar 20, 2017, at 11:09 AM, Quanwei Zhang wrote: > > Hello: > > I am using Maker2 to do gene annotation on a new rodent species. I have found some published RNA-seq data and there are selected open reading frames. Generally they get the transcript assembly through Trinity, after that they mapped the raw transcript assemblies to mouse genome and selected those with full coverage of mouse genes or part coverage. I have a questions about the evidence of transcript assembly for Marker. Which do you think is a best choice as evidences to Maker2? > (1) All the Trinity transcript assemblies? > (2) Trinity transcript assemblies that fully cover the mouse genes? > (3) Trinity transcript assemblies either fully or partly cover the mouse genes? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Mar 21 11:47:21 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 21 Mar 2017 17:47:21 +0000 Subject: [maker-devel] split genes In-Reply-To: References: <1E41F8B0-4699-42C5-B782-4AC16AB846C9@gmail.com> Message-ID: Just curious but have you tried scaffolding your assembly using your RNA-Seq de novo assembly data? We?ve seen some improvement with BUSCO calls and annotation after doing this using L_RNA_Scaffolder (though you do need to be a bit careful and try reducing your trx assembly down to a somewhat non-redundant set). chris From: maker-devel on behalf of Carson Holt Date: Tuesday, March 21, 2017 at 12:00 PM To: Quanwei Zhang Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] split genes I have no suggestions, but maybe someone else on the list may have some. ?Carson On Mar 17, 2017, at 11:49 AM, Quanwei Zhang > wrote: Thank you for your explanation. But do you have any suggestions on such issues? Is there any tools to detect such split genes or any other tool can even further improve the gene models obtained by Maker? Thanks. Best Quanwei 2017-03-17 11:21 GMT-04:00 Carson Holt >: MAKER will not try and predict a gene across contigs because it it too difficult to determine contig order. If you are able to determine order, then it is best to merge the contigs into a single scaffold before annotating rather than try and produce split models in GFF3. ?Carson > On Mar 16, 2017, at 9:48 PM, Quanwei Zhang > wrote: > > Hello: > > If one gene was covered by two contigs, sometimes we may predicted two genes. I wonder how Maker deal with such conditions? > Even Maker tried to reduce such cases, they can not be completely avoid. So I wonder whether there is any way or any tool to find such split genes (one gene split into two contigs and predicted as two genes)? > > As we know, we can also provide protein sequences and transcript assembly as evidences. Can a protein sequence or transcript assembly rescue the split genes in Maker pipe line? For example, if one transcript cover 40% of predicted genes predicted in two contigs, then merge the predicted genes into one? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Fri Mar 24 03:10:45 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Fri, 24 Mar 2017 10:10:45 +0100 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: HI! First of all thank your for previous help. Running Maker 2.31.9 with MPI (Intel) is running fine, if we use ONE node only. But, if we try to concatenate more than one node (e.g. 2 node a? 8 cores) we get this error: [...] ### Running Maker example MOAB_PROCCOUNT: 16 slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead STATUS: Parsing control files... Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. [...] /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 was created before and is EXISTING during the period of the job continuance. I attached the complete log to this e-mail. Again: THANK YOU VERY MUCH. All the best. -- Rainer Rutka Universit?t Konstanz Kommunikations-, Informations-, Medienzentrum (KIM) * KIM Ausbildung * Wissenschaftliches Rechnen/bwHPC-C5 * KIM Basisdienste, KIM Support Raum: V511 78457 Konstanz +49 7531 88-5413 -------------- next part -------------- #!/bin/bash #MSUB -N maker-job #MSUB -j oe #MSUB -o $(JOBNAME).$(JOBID) #MSUB -m ae # -M given_name.family_name at your-uni.de #MSUB -l nodes=2:ppn=8 #MSUB -l mem=20gb #MSUB -l walltime=01:00:00 # start=$(date +%s) echo " " echo "### Setting up shell environment ..." echo " " # if test -e "/etc/profile"; then source "/etc/profile"; fi; if test -e "$HOME/.bash_profile"; then source "$HOME/.bash_profile"; fi; unset LANG; export LC_ALL="C"; export MKL_NUM_THREADS=1; export OMP_NUM_THREADS=1 export USER=${USER:=`logname`} export MOAB_JOBID=${MOAB_JOBID:=`date +%s`} export MOAB_SUBMITDIR=${MOAB_SUBMITDIR:=`pwd`} export MOAB_JOBNAME=${MOAB_JOBNAME:=`basename "$0"`} export MOAB_JOBNAME=$(echo "${MOAB_JOBNAME}" | sed 's/[^a-zA-Z0-9._-]/_/g') export MOAB_NODECOUNT=${MOAB_NODECOUNT:=1} export MOAB_PROCCOUNT=${MOAB_PROCCOUNT:=1} ulimit -s 200000 echo " " echo "### Printing basic job infos to stdout ..." echo " " echo "START_TIME = `date +'%y-%m-%d %H:%M:%S %s'`" echo "HOSTNAME = ${HOSTNAME}" echo "USER = ${USER}" echo "MOAB_JOBNAME = ${MOAB_JOBNAME}" echo "MOAB_JOBID = ${MOAB_JOBID}" echo "MOAB_SUBMITDIR = ${MOAB_SUBMITDIR}" echo "MOAB_NODECOUNT = ${MOAB_NODECOUNT}" echo "MOAB_PROCCOUNT = ${MOAB_PROCCOUNT}" echo "SLURM_NODELIST = ${SLURM_NODELIST}" echo "PBS_NODEFILE = ${PBS_NODEFILE}" if test -f "${PBS_NODEFILE}"; then echo "PBS_NODEFILE (begin) ---------------------------------" NO_NODES=$(wc -l < ${PBS_NODEFILE}) cat "${PBS_NODEFILE}" echo "PBS_NODEFILE (end) -----------------------------------" else NO_NODES=1 fi # ############################################################################## echo " " echo "### Creating TMP_WORK_DIR directory and changing to it ..." echo " " # Using "/tmp/$USER" should be ok for one node jobs. In case of multi-node jobs # it might be neccessary to modify TMP_BASE_DIR to point to SLURM_SUBMIT_DIR # or to create (and delete) TMP_WORK_DIR on each node (job-type dependent). # NEVER EVER calculate in your home directory. JOB_WORK_DIR="${SLURM_JOB_NAME}.uc1.${SLURM_JOB_ID%%.*}.$(date +%y%m%d_%H%M%S)" if test -z "$SLURM_NNODES" -o "$SLURM_NNODES" = "1" then TMP_BASE_DIR="/tmp/${USER}" else # in case of 2 or more nodes, use a common scratch dir available on all nodes... TMP_BASE_DIR="$SLURM_SUBMIT_DIR" fi TMP_WORK_DIR="${TMP_BASE_DIR}/${JOB_WORK_DIR}" echo "JOB_WORK_DIR = ${JOB_WORK_DIR}" echo "TMP_BASE_DIR = ${TMP_BASE_DIR}" echo "TMP_WORK_DIR cd = ${TMP_WORK_DIR}" mkdir -vp "${TMP_WORK_DIR}" && { cd "${TMP_WORK_DIR}"; pwd; } || { echo "ERROR: cd $TMP_WORK_DIR"; exit 1; } # Remarks: # * The job's temporary subdirectory JOB_WORK_DIR consists of SLURM_JOB_NAME # and SLURM_JOB_ID connected by ".uc1.". This is a little bit of magic since # the output file of your job follows the same rule. Therefore the # sorting of files belonging to one job will work nicely, when you # list the result files later in the submit directory (SLURM_SUBMIT_DIR). # * Using TMP_BASE_DIR="/tmp/$USER" is ok, if the job requires less # than 3.6 TB of node local disk space (for details see "www.bwhpc-c5.de"). # ############################################################################## echo " " echo "### Loading MAKER module:" echo " " module load bio/maker/2.31.9 [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } echo "MAKER_VERSION = $MAKER_VERSION" module list echo " " echo "### Copying input examples files for job:" echo " " cp -v ${MAKER_EXA_DIR}/*.{fasta,ctl} . sleep 2 echo " " echo "### Display internal Maker/bwHPC environments..." echo " " echo "MAKER_BIN_DIR = ${MAKER_BIN_DIR}" echo "MAKER_EXA_DIR = ${MAKER_EXA_DIR}" echo "" echo " " echo "### Runing Maker example" echo " " export OMPI_MCA_mpi_warn_on_fork=0 # # Do NOT use mpiexec here. Unfortunately this crashes # "STATUS: Processing and indexing input FASTA files..." # exec.hydra -n 2 maker -h echo "MOAB_PROCCOUNT: ${MOAB_PROCCOUNT:=1}" # do NOT use mpiexec. use mpiexec.hydra or mpirun. # mpirun -n ${MOAB_PROCCOUNT} maker -h # mpirun -n ${MOAB_PROCCOUNT} maker 2>&1 >maker_$(date +%Y-%m-%d_%H:%M:%S).out mpirun -n ${MOAB_PROCCOUNT} maker echo "### Cleaning up files ... removing unnecessary scratch files ..." echo " " # rm -fv sleep 3 # Sleep some time so potential stale nfs handles can disappear. echo " " echo "### Compressing results and copying back result archive ..." echo " " cd "${TMP_BASE_DIR}" mkdir -vp "${MOAB_SUBMITDIR}" # if user has deleted or moved the submit dir echo "Creating result tgz-file '${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz' ..." tar -zcvf "${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz" "${JOB_WORK_DIR}" \ || { echo "ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR '$TMP_WORK_DIR' on host '$HOSTNAME' manually (if not done automatically by queueing system)."; exit 102; } # Remarks: # * The resulting tgz file is copied back to the submit directory. # The name of the tgz file looks similar too # "bwunicluster-maker-example.moab.275.110528_101755.tgz" echo " " echo "### Final cleanup: Remove TMP_WORK_DIR ..." echo " " rm -rvf "${TMP_WORK_DIR}" echo "END_TIME = `date +'%y-%m-%d %H:%M:%S %s'`" end=$(date +%s) echo " " echo "### Calculate duration ..." echo " " diff=$[end-start] if [ $diff -lt 60 ]; then echo "Runtime (approx.): '$diff' secs" elif [ $diff -ge 60 ]; then echo 'Runtime (approx.): '$[$diff / 60] 'min(s) '$[$diff % 60] 'secs' fi -------------- next part -------------- ### Setting up shell environment ... ### Printing basic job infos to stdout ... START_TIME = 17-03-24 04:35:21 1490326521 HOSTNAME = uc1n385 USER = kn_pop235844 MOAB_JOBNAME = maker-job MOAB_JOBID = 11658541 MOAB_SUBMITDIR = /pfs/work2/workspace/scratch/kn_pop235844-wstest-0 MOAB_NODECOUNT = 2 MOAB_PROCCOUNT = 16 SLURM_NODELIST = uc1n[385,397] PBS_NODEFILE = ### Creating TMP_WORK_DIR directory and changing to it ... JOB_WORK_DIR = maker-job.uc1.11658541.170324_043521 TMP_BASE_DIR = /tmp/kn_pop235844 TMP_WORK_DIR cd = /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521 mkdir: created directory '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521' /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521 ### Loading MAKER module: MAKER_VERSION = 2.31.9 Currently Loaded Modulefiles: 1) compiler/intel/16.0(default) 2) mpi/impi/5.1.3-intel-16.0(default) 3) bio/maker/2.31.9 ### Copying input examples files for job: '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_contig.fasta' -> './dpp_contig.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_est.fasta' -> './dpp_est.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_protein.fasta' -> './dpp_protein.fasta' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_bopts.ctl' -> './maker_bopts.ctl' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_exe.ctl' -> './maker_exe.ctl' '/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_opts.ctl' -> './maker_opts.ctl' ### Display internal Maker/bwHPC environments... MAKER_BIN_DIR = /opt/bwhpc/common/bio/maker/2.31.9/bin MAKER_EXA_DIR = /opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples ### Runing Maker example MOAB_PROCCOUNT: 16 slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521': No such file or directory: going to /tmp instead STATUS: Parsing control files... Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. ### Cleaning up files ... removing unnecessary scratch files ... ### Compressing results and copying back result archive ... Creating result tgz-file '/pfs/work2/workspace/scratch/kn_pop235844-wstest-0/maker-job.uc1.11658541.170324_043521.tgz' ... maker-job.uc1.11658541.170324_043521/ maker-job.uc1.11658541.170324_043521/dpp_contig.fasta maker-job.uc1.11658541.170324_043521/dpp_est.fasta maker-job.uc1.11658541.170324_043521/dpp_protein.fasta maker-job.uc1.11658541.170324_043521/maker_bopts.ctl maker-job.uc1.11658541.170324_043521/maker_exe.ctl maker-job.uc1.11658541.170324_043521/maker_opts.ctl maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/ maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log ### Final cleanup: Remove TMP_WORK_DIR ... removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_est.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_protein.fasta' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_bopts.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_exe.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_opts.ctl' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log' removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log' removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output' removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521' END_TIME = 17-03-24 04:36:08 1490326568 ### Calculate duration ... Runtime (approx.): '47' secs -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From carsonhh at gmail.com Fri Mar 24 09:00:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 Mar 2017 09:00:58 -0600 Subject: [maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE In-Reply-To: References: <021ac88b-3574-14cf-ce56-acf9e07f0fab@uni-konstanz.de> <999a411b-9ba3-ec33-e7f7-ab0f8294e777@uni-konstanz.de> <9c57acf0-30ee-3713-65c0-801edac10098@uni-konstanz.de> <1b1dd2ab-d9fb-cea0-9161-55cb2a4cfb6a@uni-konstanz.de> <341895b3-421f-af4e-f805-61d63c500fd6@uni-konstanz.de> <62E6AC62-7EF3-4AA0-A584-0687BF23E2C6@gmail.com> <2E82A30B-5B42-41A9-BEC0-2A0461739682@gmail.com> Message-ID: <2D6022EE-3AFC-4B87-99A3-2D310995A844@gmail.com> This error ?> slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead It is from SLURM and not from MAKER. It occurs before your job even started. It?s from the SLURM initialization of one of the nodes you are using. Note /tmp is not shared. It is independent on each node. So /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 may exist on one node, but not on the others. Since you are somehow setting this before you launch the job, SLURM is complaining because it doesn?t exist on one of the other nodes during initialization. So you need to review how you are launching things. ?Carson > On Mar 24, 2017, at 3:10 AM, Rainer Rutka wrote: > > HI! > First of all thank your for previous help. > Running Maker 2.31.9 with MPI (Intel) is running fine, if we > use ONE node only. > > But, if we try to concatenate more than one node (e.g. 2 node a? 8 > cores) we get this error: > > [...] > ### Running Maker example > > MOAB_PROCCOUNT: 16 > slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file or directory: going to /tmp instead > STATUS: Parsing control files... > Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184. > [...] > > /tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356 > was created before and is EXISTING during the period of the > job continuance. > > I attached the complete log to this e-mail. > > Again: THANK YOU VERY MUCH. > > All the best. > > -- > Rainer Rutka > Universit?t Konstanz > Kommunikations-, Informations-, Medienzentrum (KIM) > * KIM Ausbildung > * Wissenschaftliches Rechnen/bwHPC-C5 > * KIM Basisdienste, KIM Support > Raum: V511 > 78457 Konstanz > +49 7531 88-5413 > From carson.holt at genetics.utah.edu Wed Mar 29 12:12:35 2017 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 29 Mar 2017 18:12:35 +0000 Subject: [maker-devel] non-M gene models In-Reply-To: <59ca4391-d32e-bfa8-4118-8c9586f3dfe4@email.arizona.edu> References: <717138b6-fc7f-8f23-e550-c3019c4f96ec@email.arizona.edu> <59ca4391-d32e-bfa8-4118-8c9586f3dfe4@email.arizona.edu> Message-ID: <0AD41A2D-9CFE-48DE-B338-F15D3A590B30@genetics.utah.edu> Maybe. Those two options can result in a lot of partial models. Also setting always_complete=1 will help some. Models without M at the start are generally partial models. There is often something about the contig that keeps it from being a whole model (single basepair error breaks ORF or splice site, or a string of NNN?s overlap part of an exon). You can also try identifying InterPro domain and dropping any model without a defined domain (i.e. if it?s going to be partial, at least make sure it?s useful in its partial form). ?Carson On Mar 29, 2017, at 4:23 AM, Dario Copetti > wrote: Looking at the config file again I notice this: est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no I usually turn them on only to get models from ESTs to train Augustus and SNAP: do you think that having these parameters on during the final annotation will produce the non-M models? If so, do you think that re-running MAKER again with them turned off and using the MAKER-derived gff3 will clean out these models? Can you elaborate a bit more on the usage of these two parameters? Thanks, Dario On 3/29/2017 12:07 PM, Dario Copetti wrote: Hi Carson, We are ready to submit several different sets of annotations but we are now stuck with the issue of having models which protein sequence does not start with Met, and NCBI is picky about that. Below I paste an example of a genome we are working on: as you see, most (95%) of the models start with M, but a significant fraction (almost 1500 models!) does not. We used MAKER 2.31.8, specifying the option of having models that only start with M. We realize that this issue may not be easy to fix - and also that there are indeed isoforms that do not start with M - but how would you fix this? Within or outside MAKER I mean, any help will be appreciated. Some time ago, Josh and Sharon (cc'd) fixed the models by having the CDS start at the first M that was in frame with the exon, and wrote a script for that. Is this issue maybe fixed in a newer version of MAKER? How else would you fix it or deal with NCBI genomes people? Thanks, Dario grep -A1 ">" maker_proteins_161026.fasta | grep -v ">" | grep -v "\-\-" | cut -c1 | sort | uniq -c 106 A 33 C 69 D 88 E 53 F 94 G 34 H 86 I 77 K 144 L 28245 M 58 N 72 P 44 Q 95 R 142 S 80 T 114 V 29 W 6 X 53 Y -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu -- Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5 1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From annabel.beichman at gmail.com Thu Mar 30 11:51:36 2017 From: annabel.beichman at gmail.com (Annabel Beichman) Date: Thu, 30 Mar 2017 10:51:36 -0700 Subject: [maker-devel] RepeatMasker masking olfactory receptors Message-ID: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> Hi Carson, I have a question about RepeatMasker within Maker ? I am finding that all class II olfactory receptors (families like OR2, OR5) are being masked by RepeatMasker as ?RTE-BovB? repeats. This leads to them not being annotated by Maker. I don?t expect my species (a mustelid) to have a large number of Bov-B repeats, and when I put the sequences annotated in my genome as RTE-BovB into repbase?s CENSOR only 13 out of 960 sequences have a hit to anything in repbase. If I put those same sequences into NCBI blast, however, they all blast to olfactory receptors. I am finding the same pattern with another related mustelid de novo genome, and took the Ensembl ferret genome and ran it through the same pipeline and am finding a large number of Bov-B repeats there as well, despite there being none in the official annotation of that genome. I used RepeatMasker with all species libraries, plus a custom library from RepeatModeler. Any idea what might be going on? Thanks so much! ~ Annabel From 4urelie.K at gmail.com Thu Mar 30 12:54:07 2017 From: 4urelie.K at gmail.com (Aurelie K) Date: Thu, 30 Mar 2017 12:54:07 -0600 Subject: [maker-devel] RepeatMasker masking olfactory receptors In-Reply-To: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> References: <27F33185-148C-4253-B597-D0B2B3151131@gmail.com> Message-ID: Hi Annabel, I would run RM by specifying your (group of) species, using the -s option of Repeat Masker, mostly if you have a custom de novo library. This will limit the cross masking of repeats that have been identified in other species. Cheers, Aurelie On 30 March 2017 at 11:51, Annabel Beichman wrote: > Hi Carson, > I have a question about RepeatMasker within Maker ? > I am finding that all class II olfactory receptors (families like OR2, > OR5) are being masked by RepeatMasker as ?RTE-BovB? repeats. This leads to > them not being annotated by Maker. I don?t expect my species (a mustelid) > to have a large number of Bov-B repeats, and when I put the sequences > annotated in my genome as RTE-BovB into repbase?s CENSOR only 13 out of 960 > sequences have a hit to anything in repbase. If I put those same sequences > into NCBI blast, however, they all blast to olfactory receptors. I am > finding the same pattern with another related mustelid de novo genome, and > took the Ensembl ferret genome and ran it through the same pipeline and am > finding a large number of Bov-B repeats there as well, despite there being > none in the official annotation of that genome. > > I used RepeatMasker with all species libraries, plus a custom library from > RepeatModeler. > > Any idea what might be going on? > > Thanks so much! > > ~ Annabel > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: