From rob.syme at gmail.com Thu Jan 5 00:41:25 2017 From: rob.syme at gmail.com (Rob Syme) Date: Thu, 05 Jan 2017 06:41:25 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts Message-ID: Hi all The MAKER wiki page "Repeat Library Construction - Advanced " describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? Thanks! Rob Syme Research Associate Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From olegl at volcani.agri.gov.il Thu Jan 5 04:07:31 2017 From: olegl at volcani.agri.gov.il (Oleg Lovky) Date: Thu, 5 Jan 2017 10:07:31 +0000 Subject: [maker-devel] Unable to train SNAP Message-ID: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Hello, I'm running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000). However, I'm not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file. Furthermore, when trying to use maker2zff I'm getting empty genome.ann and genome.dna files. Please advise. Regards, Oleg Lovky, MSc. Research Engineer Institute of Plant Sciences ARO, Volcani Center Cell: 054-4870319 [v95_15] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 16191 bytes Desc: image001.png URL: From michael.s.campbell1 at gmail.com Thu Jan 5 08:54:17 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 5 Jan 2017 09:54:17 -0500 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: <3B3F80CA-BFA1-4F0E-A2F1-CA60E8496D5F@gmail.com> Hi Rob, There is a link near the bottom of that wiki page at the end of this line "CRL and other custom scripts are available here.? That points to this URL http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz Thanks, Mike > On Jan 5, 2017, at 1:41 AM, Rob Syme wrote: > > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced " describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Thu Jan 5 19:29:35 2017 From: rob.syme at gmail.com (Rob Syme) Date: Fri, 06 Jan 2017 01:29:35 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: Oh dear. That's embarrassing for me! Sorry for the silly question. -r On Thu, 5 Jan 2017 at 14:41 Rob Syme wrote: > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced > " > describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded > MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. > Are they distributed with MAKER or separately. Does anybody know where to > find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu Jan 5 20:23:17 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Fri, 6 Jan 2017 13:23:17 +1100 Subject: [maker-devel] Unable to train SNAP In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Message-ID: Are you using the -n option with maker2zff? You often get empty genome.ann and genome.dna files if you don't. On 5 January 2017 at 21:07, Oleg Lovky wrote: > Hello, > > > > I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) > containing ~50k reads (length ranges from 70 to 12000). > > However, I?m not getting transcript and proteins fasta files at all, > despite Maker not giving any errors and everything is listed as finished in > the datastore log file. > > Furthermore, when trying to use maker2zff I?m getting empty genome.ann and > genome.dna files. > > > > Please advise. > > > > Regards, > > > > Oleg Lovky, MSc. > > Research Engineer > > Institute of Plant Sciences > > ARO, Volcani Center > > Cell: 054-4870319 > > [image: v95_15] > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 16191 bytes Desc: not available URL: From carsonhh at gmail.com Fri Jan 6 13:28:02 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Jan 2017 12:28:02 -0700 Subject: [maker-devel] Unable to train SNAP In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Message-ID: <8F65E561-7450-4B5A-8F1B-4E51C0D25BE2@gmail.com> The maker2zff script has a number of thresholds that must be reached to avoid filtering all models. If you don?t have protein evidence in the dataset for example, then that filter may always be failing. You may just want to turn all filters off with the -n option as previously suggested. ?Carson > On Jan 5, 2017, at 3:07 AM, Oleg Lovky wrote: > > Hello, > > I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000). > However, I?m not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file. > Furthermore, when trying to use maker2zff I?m getting empty genome.ann and genome.dna files. > > Please advise. > > Regards, > > Oleg Lovky, MSc. > Research Engineer > Institute of Plant Sciences > ARO, Volcani Center > Cell: 054-4870319 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at msu.edu Thu Jan 5 08:28:00 2017 From: kchilds at msu.edu (Childs, Kevin) Date: Thu, 5 Jan 2017 14:28:00 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: <6AE4044B-9011-4421-A6F1-FE3B95BBB11D@msu.edu> Rob, The scripts can be found in a link at the bottom of this wiki page: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced Kevin Childs --- Kevin Childs, PhD Assistant Professor - Fixed Term Center for Genomics-Enabled Plant Science Plant Biology Department Michigan State University kchilds at msu.edu 517-775-2844 (m) 517-884-6926 (o) http://childslab.plantbiology.msu.edu > On Jan 5, 2017, at 1:41 AM, Rob Syme wrote: > > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Fri Jan 6 19:22:10 2017 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Fri, 6 Jan 2017 20:22:10 -0500 Subject: [maker-devel] /tmp full Message-ID: Hi all, Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible. I am using maker3-beta. Thanks for any help, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Jan 7 17:29:29 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 7 Jan 2017 16:29:29 -0700 Subject: [maker-devel] /tmp full In-Reply-To: References: Message-ID: If you use the MPI settings, then all processes will share a single temporary directory, otherwise they each will have a separate one since they can?t intercommunicate. MAKER tries to cleanup its files on finish or failure, but if you or the system kill it with certain signals, then it is reaped immediately by the system and not allowed to finish cleaning up. Signals 9 and 19 for example will do that. If a failure is related to the drive being full or a memory issue, then your system may be hitting it with one of these uncatchable signals. For example SLURM may use signal 9 or 19 if a process fails to respond to signal 15 in a timely manner (i.e. MAKER may be removing files, but SLURM gets impatient and kills it more aggressively because it thinks the process is not responding). You can always try and empty /tmp as the first step in your batch script, and it will remove files belonging to you before launching MAKER. ?Carson > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin wrote: > > Hi all, > > Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. > > I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible. > > I am using maker3-beta. > > Thanks for any help, > Ben > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Sun Jan 8 10:24:36 2017 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Sun, 8 Jan 2017 11:24:36 -0500 Subject: [maker-devel] /tmp full In-Reply-To: References: Message-ID: OK, thanks for the tips. Knowing the particulars of how SLURM might be causing this is extremely helpful. I'll try to just empty /tmp before running MAKER on each node, as you suggest. I suspect that will work but will work on getting MPI running as well. Thanks! Ben On Sat, Jan 7, 2017 at 6:29 PM, Carson Holt wrote: > If you use the MPI settings, then all processes will share a single > temporary directory, otherwise they each will have a separate one since > they can?t intercommunicate. > > MAKER tries to cleanup its files on finish or failure, but if you or the > system kill it with certain signals, then it is reaped immediately by the > system and not allowed to finish cleaning up. Signals 9 and 19 for example > will do that. If a failure is related to the drive being full or a memory > issue, then your system may be hitting it with one of these uncatchable > signals. For example SLURM may use signal 9 or 19 if a process fails to > respond to signal 15 in a timely manner (i.e. MAKER may be removing files, > but SLURM gets impatient and kills it more aggressively because it thinks > the process is not responding). You can always try and empty /tmp as the > first step in your batch script, and it will remove files belonging to you > before launching MAKER. > > ?Carson > > > > > > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin > wrote: > > > > Hi all, > > > > Maker keeps filling up the /tmp directories on the cluster I am using. > It appears that most of the space is taken with many versions of various > blast databases. I suspect that this issue is partly due to my not using > MPI and instead launching multiple instances of maker (typically 16) in the > same working directory. However, it appears that maker is also leaving some > of these databases in /tmp even after it has died or been killed and they > are piling up. > > > > I am submitting my jobs to the cluster via SLURM but have installed > maker locally rather than system-wide. My system administrator is going to > try creating a larger locally mounted directory on some of the nodes for me > but I wanted to check to see if you have any other suggestions to solve the > issue or make sure that maker cleans up /tmp as aggressively as possible. > > > > I am using maker3-beta. > > > > Thanks for any help, > > Ben > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- _____________________________________________________ Benjamin ER Rubin, PhD Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmainzer at life.illinois.edu Mon Jan 9 01:02:01 2017 From: lmainzer at life.illinois.edu (Liudmila Sergeevna Mainzer) Date: Mon, 9 Jan 2017 01:02:01 -0600 Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names Message-ID: Hello, MAKER developers! I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here. I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER. I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files: head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html
     Tandem Repeats Finder Program written by:
                   Gary Benson
                   Program in Bioinformatics
                   Boston University
     Version 4.09
     Sequence: InputSequencefrag-1 CHUNK number:191 
     size:455659  offset:57300000
     
     Parameters: 2 3 5 75 20 33 7

etcetera
But also the second chunk:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
 
InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html target
     ="explanation">Alignment explanation

Indices: 56278--56322 Score: 55 Period size: 1 Copynumber: 45.0 Consensus size: 1 etcetera See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked. So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable. Once again, this only arises on very long file names, and those seem to be created by MAKER. Example: LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well... So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names. Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole. We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi. Many thanks in advance, Liudmila Mainzer ---------------- Senior Research Scientist National Center for Supercomputing Applications Research Assistant Professor Institute of Genomic Biology University of Illinois 217-300-0568 1205 W. Clark St. Room 4026 Urbana, IL 61801 From carsonhh at gmail.com Mon Jan 9 10:30:09 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 9 Jan 2017 09:30:09 -0700 Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names In-Reply-To: References: Message-ID: <733D5263-6CFC-4AB3-BFDD-30330B0E1985@gmail.com> The name used by maker is based off of the input file name, so quick fix would just be to rename your input file to have a shorter name. ?Carson > On Jan 9, 2017, at 12:02 AM, Liudmila Sergeevna Mainzer wrote: > > Hello, MAKER developers! > > I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here. > > I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER. > > I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files: > > head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html > InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html > bgcolor="#File 1 of 2 FBF8BC">
>    Tandem Repeats Finder Program written by:
>                  Gary Benson
>                  Program in Bioinformatics
>                  Boston University
>    Version 4.09
>    Sequence: InputSequencefrag-1 CHUNK number:191 
>    size:455659  offset:57300000
>    
>    Parameters: 2 3 5 75 20 33 7
> 
> etcetera
> But also the second chunk:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
> InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html 
>    bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
>        HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
>     target
>    ="explanation">Alignment explanation

> Indices: 56278--56322 Score: 55 > Period size: 1 Copynumber: 45.0 Consensus size: 1 > > etcetera > > > See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked. > > So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable. > > Once again, this only arises on very long file names, and those seem to be created by MAKER. Example: > LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific > > Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well... > > So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names. > > Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole. > We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi. > > Many thanks in advance, > Liudmila Mainzer > > ---------------- > Senior Research Scientist > National Center for Supercomputing Applications > > Research Assistant Professor > Institute of Genomic Biology > > University of Illinois > 217-300-0568 > 1205 W. Clark St. Room 4026 > Urbana, IL 61801 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qlian003 at ucr.edu Wed Jan 11 23:28:32 2017 From: qlian003 at ucr.edu (Qihua Liang) Date: Wed, 11 Jan 2017 21:28:32 -0800 Subject: [maker-devel] gff file: possible sources Message-ID: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> Hi Maker develop team! I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as, Possible Sources Include: BLASTN - BLASTN alignment of EST evidence BLASTX - BLASTX alignment of protein evidence TBLASTX - TBLASTX alignment of EST evidence from closely related organisms EST2Genome - Polished EST alignment from Exonerate Protein2Genome - Polished protein alignment from Exonerate SNAP - SNAP ab inito gene prediction GENEMARK - GeneMarkab inito gene prediction Augustus - Augustus ab inito gene prediction FgenesH - FGENESH ab inito gene prediction Repeatmasker - RepeatMasker identified repeat RepeatRunner - RepeatRunner identified repeat from the repeat protein database tRNAScan - tRNAScan-SE tRNA predictions (coming soon) PASA - PASA gene predictions (coming soon) There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above? Thanks Qihua -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Jan 12 07:28:24 2017 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 12 Jan 2017 13:28:24 +0000 Subject: [maker-devel] gff file: possible sources In-Reply-To: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> References: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> Message-ID: Hi Qihua, the cdna2genome is the polished tblastx alignments from Exonerate. Basically, the source column should be the name of the tool that generated the alignment, prediction, or gene model. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 11, 2017, at 11:28 PM, Qihua Liang > wrote: Hi Maker develop team! I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as, Possible Sources Include: * BLASTN - BLASTN alignment of EST evidence * BLASTX - BLASTX alignment of protein evidence * TBLASTX - TBLASTX alignment of EST evidence from closely related organisms * EST2Genome - Polished EST alignment from Exonerate * Protein2Genome - Polished protein alignment from Exonerate * SNAP - SNAP ab inito gene prediction * GENEMARK - GeneMarkab inito gene prediction * Augustus - Augustus ab inito gene prediction * FgenesH - FGENESH ab inito gene prediction * Repeatmasker - RepeatMasker identified repeat * RepeatRunner - RepeatRunner identified repeat from the repeat protein database * tRNAScan - tRNAScan-SE tRNA predictions (coming soon) * PASA - PASA gene predictions (coming soon) There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above? Thanks Qihua _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From patel.kumar.vipul at gmail.com Fri Jan 20 02:44:26 2017 From: patel.kumar.vipul at gmail.com (Vipul Patel) Date: Fri, 20 Jan 2017 09:44:26 +0100 Subject: [maker-devel] Maker crash for long chrm. Message-ID: Hi, I hope someone can help me to figure out what is actually going wrong. I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k rank=16, hostname=dummy ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:chr_test ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:chr_test examining contents of the fasta file and run log --Next Contig-- Processing run.log file... I got the same message if I run it without MPI, So I can guess it is not an MPI issue. How can I find out if some jobs died so maybe this could lead to this problem? Other ideas how I can tackle this problem? Kind regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From patel.kumar.vipul at gmail.com Fri Jan 20 07:34:28 2017 From: patel.kumar.vipul at gmail.com (Vipul Patel) Date: Fri, 20 Jan 2017 14:34:28 +0100 Subject: [maker-devel] Maker crash for long chrm. In-Reply-To: References: Message-ID: Solved. After some digging and printing I found out the problem. It was snap itself! For anybody who maybe runs in the same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. Kind regards 2017-01-20 9:44 GMT+01:00 Vipul Patel : > Hi, > > I hope someone can help me to figure out what is actually going wrong. > > I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP > variable not to use NFS. The given testcase as well for 1k 1MB runs without any problems. > > Applying it to a sequence, for example with 57MB it failes, I tried it as > well with a different sequences around 60MB, same outcome. > > I looked into the logs, but it was not really helpful as it was just > stated that the job failed > > It crashed with following message: > > deleted:0 genes > substr outside of string at /usr/share/perl/5.18/Carp.pm line 165. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Calling translate without a seq argument! > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/ > Bio/Root/Root.pm:447 > STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/ > Bio/Tools/CodonTable.pm:419 > STACK: CGL::TranslationMachine::longest_translation_plus_stop > programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280 > STACK: maker::auto_annotator::get_translation_seq > programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 > STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../ > lib/Widget/snap.pm:974 > STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/ > snap.pm:690 > STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194 > STACK: Process::MpiChunk::_go programs/maker/maker/bin/../ > lib/Process/MpiChunk.pm:1469 > STACK: Process::MpiChunk::run programs/maker/maker/bin/../ > lib/Process/MpiChunk.pm:341 > STACK: programs/maker/maker/bin/maker:979 > ----------------------------------------------------------- > --> rank=16, hostname=dummy > ERROR: Failed while gathering ab-init output files > ERROR: Chunk failed at level:1, tier_type:2 > FAILED CONTIG:chr_test > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:chr_test > > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > > I got the same message if I run it without MPI, So I can guess it is not > an MPI issue. > How can I find out if some jobs died so maybe this could lead to this > problem? > Other ideas how I can tackle this problem? > > Kind regards > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 20 16:00:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 20 Jan 2017 15:00:49 -0700 Subject: [maker-devel] Maker crash for long chrm. In-Reply-To: References: Message-ID: <59841676-741F-496D-9E47-7750417033A4@gmail.com> I?m glad it?s working for you. Let us know if anything else comes up. ?Carson > On Jan 20, 2017, at 6:34 AM, Vipul Patel wrote: > > Solved. After some digging and printing I found out the problem. > > It was snap itself! > > For anybody who maybe runs in the same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. > > Kind regards > > 2017-01-20 9:44 GMT+01:00 Vipul Patel >: > Hi, > > I hope someone can help me to figure out what is actually going wrong. > > I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k > Applying it to a sequence, for example with 57MB it failes, I tried it as well with a different sequences around 60MB, same outcome. > > I looked into the logs, but it was not really helpful as it was just stated that the job failed > > It crashed with following message: > > deleted:0 genes > substr outside of string at /usr/share/perl/5.18/Carp.pm line 165. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Calling translate without a seq argument! > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447 > STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419 > STACK: CGL::TranslationMachine::longest_translation_plus_stop programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280 > STACK: maker::auto_annotator::get_translation_seq programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 > STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/snap.pm:974 > STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/snap.pm:690 > STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194 > STACK: Process::MpiChunk::_go programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469 > STACK: Process::MpiChunk::run programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: programs/maker/maker/bin/maker:979 > ----------------------------------------------------------- > --> rank=16, hostname=dummy > ERROR: Failed while gathering ab-init output files > ERROR: Chunk failed at level:1, tier_type:2 > FAILED CONTIG:chr_test > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:chr_test > > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > > I got the same message if I run it without MPI, So I can guess it is not an MPI issue. > How can I find out if some jobs died so maybe this could lead to this problem? > Other ideas how I can tackle this problem? > > Kind regards > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mayabritstein at gmail.com Mon Jan 23 02:30:40 2017 From: mayabritstein at gmail.com (Maya Britstein) Date: Mon, 23 Jan 2017 10:30:40 +0200 Subject: [maker-devel] Authorization failed. Message-ID: Hi, I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work. thanks, Maya -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Mon Jan 23 06:43:53 2017 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon, 23 Jan 2017 12:43:53 +0000 Subject: [maker-devel] Authorization failed. In-Reply-To: References: Message-ID: Hi Maya, If you follow the link below you will find at the bottom of the page a portion of the form that allows you to reset your password. It?s a little misleading because it looks like it?s only an ?Unsubscribe? option, but it also takes you to a page that allows you to update your subscription details including password reminder/reset. The actual text for the portion of the page you?re looking for is this: 'To unsubscribe from maker-devel, get a password reminder, or change your subscription options enter your subscription email address:' The linke is: http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Regards, Barry On Jan 23, 2017, at 1:30 AM, Maya Britstein > wrote: Hi, I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work. thanks, Maya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Tue Jan 24 08:06:22 2017 From: daren.card at gmail.com (Daren C. Card) Date: Tue, 24 Jan 2017 08:06:22 -0600 Subject: [maker-devel] Maker error: Invalid nucleotide Message-ID: Hi everyone, I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig: ? Widget::augustus: /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus #-------------------------------# /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR Invalid nucleotide '8' encountered. /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR Invalid nucleotide '8' encountered. ERROR: Augustus failed --> rank=7, hostname=moonunit0 ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2 FAILED CONTIG:scaffold-92 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold-92 examining contents of the fasta file and run log ? Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues. Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary. Thanks, Daren Card UT-Arlington From carsonhh at gmail.com Wed Jan 25 11:37:50 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Jan 2017 10:37:50 -0700 Subject: [maker-devel] Maker error: Invalid nucleotide In-Reply-To: References: Message-ID: <5E13AB7E-9175-4440-AD62-A53BD9DD8DE1@gmail.com> Try running the contig in question (scaffold-92) as a separate MAKER run. That may haelp indicate if the issue may be a corrupt intermediate file (if it is, you can set clean_try=1 to force deletion of intermediate files before rerun). ?Carson > On Jan 24, 2017, at 7:06 AM, Daren C. Card wrote: > > Hi everyone, > > I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig: > > ? > Widget::augustus: > /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus > #-------------------------------# > > /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR > Invalid nucleotide '8' encountered. > > > /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR > Invalid nucleotide '8' encountered. > > ERROR: Augustus failed > --> rank=7, hostname=moonunit0 > ERROR: Failed while preparing ab-inits > ERROR: Chunk failed at level:0, tier_type:2 > FAILED CONTIG:scaffold-92 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:scaffold-92 > > examining contents of the fasta file and run log > ? > > Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues. > > Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary. > > Thanks, > Daren Card > > UT-Arlington > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From scott at scottcain.net Wed Jan 25 14:23:02 2017 From: scott at scottcain.net (Scott Cain) Date: Wed, 25 Jan 2017 15:23:02 -0500 Subject: [maker-devel] GFF3 file format In-Reply-To: References: Message-ID: Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein wrote: > Hi, > > I have RNA-seq data, and genomic data that I want to annotate using maker. > > From what I understood, I need to genarate a gff3 file format from the > RNA-seq mapping sequences. I had mapped the RNA sequences to the genome > using bowtie and tophat. However, I still do not know how to take these > format and convert them to a gff3 file that I can them use in maker as > annotation evidence > > I saw the wiki page, that did not mention how to make this conversion ( > http://gmod.org/wiki/GFF3) > > Can you please help me? > > Sincerely, > Maya > > ---- > Maya Britstein > Ph.D candidate > Laura Steindler's Lab > Marine Biology Department > Leon H. Charney School of Marine Sciences > University of Haifa, Israel > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Jan 25 16:03:51 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 25 Jan 2017 22:03:51 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: References: Message-ID: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly. But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence. chris From: maker-devel > on behalf of Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" >, "help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: Hi, I have RNA-seq data, and genomic data that I want to annotate using maker. From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3) Can you please help me? Sincerely, Maya ---- Maya Britstein Ph.D candidate Laura Steindler's Lab Marine Biology Department Leon H. Charney School of Marine Sciences University of Haifa, Israel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Jan 26 14:26:42 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 26 Jan 2017 15:26:42 -0500 Subject: [maker-devel] canonical protein sequences or isoform? Message-ID: Hello: I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Fri Jan 27 04:31:40 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Fri, 27 Jan 2017 11:31:40 +0100 Subject: [maker-devel] Maker-Error when started with OpenMPI Message-ID: Hi everybody. My name is Rainer. I am an administrator for our HPC-Systems at our university in Konstanz, Baden-Wuertemberg/Germany. The procect is called bwHPC-C5. See: https://www.bwhpc-c5.de/en/index.php I try to get Maker running on our bwUniCluster since weeks. Unfortunately i get errors while running a Maker job in the MPI-environment. BUILD STATUS ============================================================================== STATUS MAKER v2.31.9 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: ENABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK MODULES / INCLUDES / COMPILERS # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9 # ##### (B) Dependencies: # # conflict: any other maker version # module load compiler/gnu/5.2 # module load mpi/openmpi/2.0-gnu-5.2 [...] MPI/MOAB SUBMIT [...] ### Queues ### #MSUB -q fat #MSUB -l nodes=1:ppn=16 #MSUB -l mem=20gb #MSUB -l walltime=50:00:00 # [...] echo " " echo "### Loading MAKER module:" echo " " module load bio/maker/2.31.9 [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } echo "MAKER_VERSION = $MAKER_VERSION" module list [...] echo " " echo "### Runing Maker example" echo " " export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so export OMPI_MCA_mpi_warn_on_fork=0 echo "LD_PRELOAD=${LD_PRELOAD}" # # "STATUS: Processing and indexing input FASTA files..." # mpiexec -mca btl ^openib -n 16 maker [...] E R R O R S ======= [...] LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... [uc1n338:113607] *** Process received signal *** [uc1n338:113607] Signal: Segmentation fault (11) [uc1n338:113607] Signal code: Address not mapped (1) [uc1n338:113607] Failing at address: 0x4b0 [uc1n338:113608] *** Process received signal *** [uc1n338:113608] Signal: Segmentation fault (11) [uc1n338:113608] Signal code: Address not mapped (1) [uc1n338:113608] Failing at address: 0x4b0 [uc1n338:113621] *** Process received signal *** [uc1n338:113621] Signal: Segmentation fault (11) [uc1n338:113621] Signal code: Address not mapped (1) [uc1n338:113621] Failing at address: 0x4b0 -------------------------------------------------------------------------- mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- [...] WHATS WRONG HERE!? Thank you for your help! All the best , Rainer -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From michael.s.campbell1 at gmail.com Fri Jan 27 09:36:11 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 27 Jan 2017 10:36:11 -0500 Subject: [maker-devel] canonical protein sequences or isoform? In-Reply-To: References: Message-ID: I give MAKER all isoforms as evidence. Mike > On Jan 26, 2017, at 3:26 PM, Quanwei Zhang wrote: > > Hello: > > I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms? > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Fri Jan 27 10:13:22 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 27 Jan 2017 11:13:22 -0500 Subject: [maker-devel] transcript assembly of RNA-seq data Message-ID: Hello: I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. (1) De novo assembly without mapping to any genome assembly (like Trinity)? (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 27 10:23:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Jan 2017 09:23:40 -0700 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: References: Message-ID: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> (1) De novo assembly without mapping to any genome assembly (like Trinity) You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. ?Carson > On Jan 27, 2017, at 9:13 AM, Quanwei Zhang wrote: > > Hello: > > I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. > (1) De novo assembly without mapping to any genome assembly (like Trinity)? > (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? > (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri Jan 27 16:21:15 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jan 2017 22:21:15 +0000 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> References: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> Message-ID: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> Yup I agree. Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used? chris From: maker-devel > on behalf of Carson Holt > Date: Friday, January 27, 2017 at 10:23 AM To: Quanwei Zhang > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] transcript assembly of RNA-seq data (1) De novo assembly without mapping to any genome assembly (like Trinity) You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. ?Carson On Jan 27, 2017, at 9:13 AM, Quanwei Zhang > wrote: Hello: I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. (1) De novo assembly without mapping to any genome assembly (like Trinity)? (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? Thanks Best Quanwei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 27 18:53:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Jan 2017 17:53:10 -0700 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> References: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> Message-ID: No. My experience has just been with regular Trinity de novo assembly. Of course, I?d be interested in any one else?s attempt at this though. ?Carson > On Jan 27, 2017, at 3:21 PM, Fields, Christopher J wrote: > > Yup I agree. Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used? > > chris > > From: maker-devel > on behalf of Carson Holt > > Date: Friday, January 27, 2017 at 10:23 AM > To: Quanwei Zhang > > Cc: "maker-devel at yandell-lab.org " > > Subject: Re: [maker-devel] transcript assembly of RNA-seq data > >> (1) De novo assembly without mapping to any genome assembly (like Trinity) >> >> You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. >> >> ?Carson >> >> >>> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang > wrote: >>> >>> Hello: >>> >>> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. >>> (1) De novo assembly without mapping to any genome assembly (like Trinity)? >>> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? >>> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Jan 28 14:53:45 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 28 Jan 2017 13:53:45 -0700 Subject: [maker-devel] Maker-Error when started with OpenMPI In-Reply-To: References: Message-ID: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> Try adding one of the following to your mpiexec command ?> 1. --mca btl ^openib 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0 One or the other may fix your issue. The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter. --Carson > On Jan 27, 2017, at 3:31 AM, Rainer Rutka wrote: > > Hi everybody. > > My name is Rainer. I am an administrator for our HPC-Systems at our > university in Konstanz, Baden-Wuertemberg/Germany. > The procect is called bwHPC-C5. > > See: https://www.bwhpc-c5.de/en/index.php > > I try to get Maker running on our bwUniCluster since weeks. Unfortunately > i get errors while running a Maker job in the MPI-environment. > > BUILD STATUS > > ============================================================================== > STATUS MAKER v2.31.9 > ============================================================================== > PERL Dependencies: VERIFIED > External Programs: VERIFIED > External C Libraries: VERIFIED > MPI SUPPORT: ENABLED > MWAS Web Interface: DISABLED > MAKER PACKAGE: CONFIGURATION OK > > MODULES / INCLUDES / COMPILERS > > # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9 > # > ##### (B) Dependencies: > # > # conflict: any other maker version > # module load compiler/gnu/5.2 > # module load mpi/openmpi/2.0-gnu-5.2 > [...] > > MPI/MOAB SUBMIT > > [...] > ### Queues ### > #MSUB -q fat > #MSUB -l nodes=1:ppn=16 > #MSUB -l mem=20gb > #MSUB -l walltime=50:00:00 > # > [...] > echo " " > echo "### Loading MAKER module:" > echo " " > module load bio/maker/2.31.9 > [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } > echo "MAKER_VERSION = $MAKER_VERSION" > module list > [...] > echo " " > echo "### Runing Maker example" > echo " " > export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so > export OMPI_MCA_mpi_warn_on_fork=0 > > echo "LD_PRELOAD=${LD_PRELOAD}" > # > # "STATUS: Processing and indexing input FASTA files..." > # > mpiexec -mca btl ^openib -n 16 maker > [...] > > > E R R O R S > ======= > [...] > LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > [uc1n338:113607] *** Process received signal *** > [uc1n338:113607] Signal: Segmentation fault (11) > [uc1n338:113607] Signal code: Address not mapped (1) > [uc1n338:113607] Failing at address: 0x4b0 > [uc1n338:113608] *** Process received signal *** > [uc1n338:113608] Signal: Segmentation fault (11) > [uc1n338:113608] Signal code: Address not mapped (1) > [uc1n338:113608] Failing at address: 0x4b0 > [uc1n338:113621] *** Process received signal *** > [uc1n338:113621] Signal: Segmentation fault (11) > [uc1n338:113621] Signal code: Address not mapped (1) > [uc1n338:113621] Failing at address: 0x4b0 > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > [...] > > WHATS WRONG HERE!? > > Thank you for your help! > > All the best , > > Rainer > > -- > Rainer Rutka > University of Konstanz > Communication, Information, Media Centre (KIM) > * High-Performance-Computing (HPC) > * KIM-Support and -Base-Services > Room: V511 > 78457 Konstanz, Germany > +49 7531 88-5413 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From rainer.rutka at uni-konstanz.de Mon Jan 30 02:32:08 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Mon, 30 Jan 2017 09:32:08 +0100 Subject: [maker-devel] Maker-Error when started with OpenMPI In-Reply-To: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> References: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> Message-ID: Hi Carson! Thank you VERY MUCH for your hints. Much appreciated! I'll test these today and let you know about the results. Again: THANKS! :-) BTW: I'm not a scientist. Only a system operator. :-) Am 28.01.2017 um 21:53 schrieb Carson Holt: > Try adding one of the following to your mpiexec command ?> > 1. --mca btl ^openib > 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 > 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0 > One or the other may fix your issue. The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter. > --Carson -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From qwzhang0601 at gmail.com Tue Jan 31 11:36:13 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 12:36:13 -0500 Subject: [maker-devel] collecting protein sequences as evidences Message-ID: I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Jan 31 13:08:21 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 14:08:21 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals Message-ID: Hello: I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Jan 31 13:26:29 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 31 Jan 2017 14:26:29 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: Message-ID: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Tue Jan 31 14:57:28 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 31 Jan 2017 15:57:28 -0500 Subject: [maker-devel] collecting protein sequences as evidences In-Reply-To: References: Message-ID: <2E4D90C9-6D6E-4F52-A361-AFB06A61D2C2@gmail.com> Hi Quanwei, (1) When I use uniprot I use SWISS-prot and not tremble. (2) I don?t merge files together. I just pass them all to MAKER as a comma separated list. Thanks, Mike > On Jan 31, 2017, at 12:36 PM, Quanwei Zhang wrote: > > I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. > (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). > (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Jan 31 15:05:43 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 21:05:43 +0000 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. chris On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" wrote: I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mjfi2sb3 at gmail.com Tue Jan 31 15:14:14 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Tue, 31 Jan 2017 21:14:14 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> References: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> Message-ID: Hi Christopher, How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one? Best /SB On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, wrote: > If I recall, from a BAM you would need to run a reference-based assembly > on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use > Trinity for ref-based assembly. But I always choose the route of a full de > novo assembly (again, Trinity or similar) when possible, doing some basic > cleanup (e.g. remove low confidence transcripts) and bring them as EST > evidence. > > chris > > From: maker-devel on behalf of > Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM > To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" , " > help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format > > Hi Maya, > > I'm not sure what MAKER's requirements are in this regard--I'm forwarding > this to their mailing list. > > Scott > > > On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: > > Hi, > > I have RNA-seq data, and genomic data that I want to annotate using maker. > > From what I understood, I need to genarate a gff3 file format from the > RNA-seq mapping sequences. I had mapped the RNA sequences to the genome > using bowtie and tophat. However, I still do not know how to take these > format and convert them to a gff3 file that I can them use in maker as > annotation evidence > > I saw the wiki page, that did not mention how to make this conversion ( > http://gmod.org/wiki/GFF3 > > ) > > Can you please help me? > > Sincerely, > Maya > > ---- > Maya Britstein > Ph.D candidate > Laura Steindler's Lab > Marine Biology Department > Leon H. Charney School of Marine Sciences > University of Haifa, Israel > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/ > ) > 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Jan 31 15:33:12 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 16:33:12 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. Best Quanwei 2017-01-31 16:05 GMT-05:00 Fields, Christopher J : > I agree with Mike. I also suggest not combining RNA-Seqs from different > runs (e.g. different studies) even if they are from the same tissue, > development stage etc. There are many other factors (biological variation, > sample quality, sequencing chemistry or technology differences, etc) that > can significantly and negatively impact trx assembly quality. > > chris > > On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" < > maker-devel-bounces at yandell-lab.org on behalf of > michael.s.campbell1 at gmail.com> wrote: > > I would probably try merging the replicates but not the tissues. You > can then pass the output files to MAKER in a comma separated list in the > opts file. > > Example: > est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta > > Good luck, > Mike > > > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > I am trying to assemble transcripts using RNA-seq data by the tool > Trinity, which will be used for gene annotation for Maker. Now I have data > from two tissues with two replicates each. Should I merge all four samples > to get one assembly file? Or should I merge replicates of each tissue > separately and use the two assembly files as input of Maker. Merging all > samples into one, we will have much higher coverage level, but I think > there may be some genes expressed by tissue-specific isoforms. So I not > sure whether I should merge RNA-seq from different tissues. > > What's more, I find some published RNA-seq data from another > individual (and also for different tissue from us) for the same species. > Should I merge all RNA-seq together (across individuals and tissues)? Or > should I generate different transcript assembly and use all those > assemblies as input to Maker? > > > > Thanks > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_ > yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_ > yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 31 15:35:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 31 Jan 2017 14:35:20 -0700 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list. ?Carson > On Jan 31, 2017, at 2:33 PM, Quanwei Zhang wrote: > > Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. > > Best > Quanwei > > 2017-01-31 16:05 GMT-05:00 Fields, Christopher J >: > I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. > > chris > > On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" on behalf of michael.s.campbell1 at gmail.com > wrote: > > I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. > > Example: > est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta > > Good luck, > Mike > > > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > > > Thanks > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Jan 31 17:05:43 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 23:05:43 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: References: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> Message-ID: <8BD384C9-4E46-42AC-A59F-96299EF5E104@illinois.edu> You can use RSEM for some initial filtering: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts Then I generally use the Trinity QA steps, in particular TransRate or DETONATE: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment chris From: Salim Bougouffa Date: Tuesday, January 31, 2017 at 3:14 PM To: Chris Fields , Scott Cain , Maya Britstein Cc: "maker-devel at yandell-lab.org List" , "help at gmod.org" Subject: Re: [maker-devel] GFF3 file format Hi Christopher, How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one? Best /SB On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, > wrote: If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly. But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence. chris From: maker-devel > on behalf of Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" >, "help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: Hi, I have RNA-seq data, and genomic data that I want to annotate using maker. From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3) Can you please help me? Sincerely, Maya ---- Maya Britstein Ph.D candidate Laura Steindler's Lab Marine Biology Department Leon H. Charney School of Marine Sciences University of Haifa, Israel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Jan 31 17:07:44 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 23:07:44 +0000 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> Message-ID: Exactly chris From: Carson Holt Date: Tuesday, January 31, 2017 at 3:35 PM To: Quanwei Zhang Cc: Chris Fields , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list. ?Carson On Jan 31, 2017, at 2:33 PM, Quanwei Zhang > wrote: Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. Best Quanwei 2017-01-31 16:05 GMT-05:00 Fields, Christopher J >: I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. chris On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" on behalf of michael.s.campbell1 at gmail.com> wrote: I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Wed Jan 4 23:41:25 2017 From: rob.syme at gmail.com (Rob Syme) Date: Thu, 05 Jan 2017 06:41:25 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts Message-ID: Hi all The MAKER wiki page "Repeat Library Construction - Advanced " describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? Thanks! Rob Syme Research Associate Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From olegl at volcani.agri.gov.il Thu Jan 5 03:07:31 2017 From: olegl at volcani.agri.gov.il (Oleg Lovky) Date: Thu, 5 Jan 2017 10:07:31 +0000 Subject: [maker-devel] Unable to train SNAP Message-ID: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Hello, I'm running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000). However, I'm not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file. Furthermore, when trying to use maker2zff I'm getting empty genome.ann and genome.dna files. Please advise. Regards, Oleg Lovky, MSc. Research Engineer Institute of Plant Sciences ARO, Volcani Center Cell: 054-4870319 [v95_15] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 16191 bytes Desc: image001.png URL: From michael.s.campbell1 at gmail.com Thu Jan 5 07:54:17 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 5 Jan 2017 09:54:17 -0500 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: <3B3F80CA-BFA1-4F0E-A2F1-CA60E8496D5F@gmail.com> Hi Rob, There is a link near the bottom of that wiki page at the end of this line "CRL and other custom scripts are available here.? That points to this URL http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz Thanks, Mike > On Jan 5, 2017, at 1:41 AM, Rob Syme wrote: > > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced " describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Thu Jan 5 18:29:35 2017 From: rob.syme at gmail.com (Rob Syme) Date: Fri, 06 Jan 2017 01:29:35 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: Oh dear. That's embarrassing for me! Sorry for the silly question. -r On Thu, 5 Jan 2017 at 14:41 Rob Syme wrote: > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced > " > describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded > MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. > Are they distributed with MAKER or separately. Does anybody know where to > find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu Jan 5 19:23:17 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Fri, 6 Jan 2017 13:23:17 +1100 Subject: [maker-devel] Unable to train SNAP In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Message-ID: Are you using the -n option with maker2zff? You often get empty genome.ann and genome.dna files if you don't. On 5 January 2017 at 21:07, Oleg Lovky wrote: > Hello, > > > > I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) > containing ~50k reads (length ranges from 70 to 12000). > > However, I?m not getting transcript and proteins fasta files at all, > despite Maker not giving any errors and everything is listed as finished in > the datastore log file. > > Furthermore, when trying to use maker2zff I?m getting empty genome.ann and > genome.dna files. > > > > Please advise. > > > > Regards, > > > > Oleg Lovky, MSc. > > Research Engineer > > Institute of Plant Sciences > > ARO, Volcani Center > > Cell: 054-4870319 > > [image: v95_15] > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 16191 bytes Desc: not available URL: From carsonhh at gmail.com Fri Jan 6 12:28:02 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Jan 2017 12:28:02 -0700 Subject: [maker-devel] Unable to train SNAP In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Message-ID: <8F65E561-7450-4B5A-8F1B-4E51C0D25BE2@gmail.com> The maker2zff script has a number of thresholds that must be reached to avoid filtering all models. If you don?t have protein evidence in the dataset for example, then that filter may always be failing. You may just want to turn all filters off with the -n option as previously suggested. ?Carson > On Jan 5, 2017, at 3:07 AM, Oleg Lovky wrote: > > Hello, > > I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000). > However, I?m not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file. > Furthermore, when trying to use maker2zff I?m getting empty genome.ann and genome.dna files. > > Please advise. > > Regards, > > Oleg Lovky, MSc. > Research Engineer > Institute of Plant Sciences > ARO, Volcani Center > Cell: 054-4870319 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at msu.edu Thu Jan 5 07:28:00 2017 From: kchilds at msu.edu (Childs, Kevin) Date: Thu, 5 Jan 2017 14:28:00 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: <6AE4044B-9011-4421-A6F1-FE3B95BBB11D@msu.edu> Rob, The scripts can be found in a link at the bottom of this wiki page: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced Kevin Childs --- Kevin Childs, PhD Assistant Professor - Fixed Term Center for Genomics-Enabled Plant Science Plant Biology Department Michigan State University kchilds at msu.edu 517-775-2844 (m) 517-884-6926 (o) http://childslab.plantbiology.msu.edu > On Jan 5, 2017, at 1:41 AM, Rob Syme wrote: > > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Fri Jan 6 18:22:10 2017 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Fri, 6 Jan 2017 20:22:10 -0500 Subject: [maker-devel] /tmp full Message-ID: Hi all, Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible. I am using maker3-beta. Thanks for any help, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Jan 7 16:29:29 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 7 Jan 2017 16:29:29 -0700 Subject: [maker-devel] /tmp full In-Reply-To: References: Message-ID: If you use the MPI settings, then all processes will share a single temporary directory, otherwise they each will have a separate one since they can?t intercommunicate. MAKER tries to cleanup its files on finish or failure, but if you or the system kill it with certain signals, then it is reaped immediately by the system and not allowed to finish cleaning up. Signals 9 and 19 for example will do that. If a failure is related to the drive being full or a memory issue, then your system may be hitting it with one of these uncatchable signals. For example SLURM may use signal 9 or 19 if a process fails to respond to signal 15 in a timely manner (i.e. MAKER may be removing files, but SLURM gets impatient and kills it more aggressively because it thinks the process is not responding). You can always try and empty /tmp as the first step in your batch script, and it will remove files belonging to you before launching MAKER. ?Carson > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin wrote: > > Hi all, > > Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. > > I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible. > > I am using maker3-beta. > > Thanks for any help, > Ben > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Sun Jan 8 09:24:36 2017 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Sun, 8 Jan 2017 11:24:36 -0500 Subject: [maker-devel] /tmp full In-Reply-To: References: Message-ID: OK, thanks for the tips. Knowing the particulars of how SLURM might be causing this is extremely helpful. I'll try to just empty /tmp before running MAKER on each node, as you suggest. I suspect that will work but will work on getting MPI running as well. Thanks! Ben On Sat, Jan 7, 2017 at 6:29 PM, Carson Holt wrote: > If you use the MPI settings, then all processes will share a single > temporary directory, otherwise they each will have a separate one since > they can?t intercommunicate. > > MAKER tries to cleanup its files on finish or failure, but if you or the > system kill it with certain signals, then it is reaped immediately by the > system and not allowed to finish cleaning up. Signals 9 and 19 for example > will do that. If a failure is related to the drive being full or a memory > issue, then your system may be hitting it with one of these uncatchable > signals. For example SLURM may use signal 9 or 19 if a process fails to > respond to signal 15 in a timely manner (i.e. MAKER may be removing files, > but SLURM gets impatient and kills it more aggressively because it thinks > the process is not responding). You can always try and empty /tmp as the > first step in your batch script, and it will remove files belonging to you > before launching MAKER. > > ?Carson > > > > > > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin > wrote: > > > > Hi all, > > > > Maker keeps filling up the /tmp directories on the cluster I am using. > It appears that most of the space is taken with many versions of various > blast databases. I suspect that this issue is partly due to my not using > MPI and instead launching multiple instances of maker (typically 16) in the > same working directory. However, it appears that maker is also leaving some > of these databases in /tmp even after it has died or been killed and they > are piling up. > > > > I am submitting my jobs to the cluster via SLURM but have installed > maker locally rather than system-wide. My system administrator is going to > try creating a larger locally mounted directory on some of the nodes for me > but I wanted to check to see if you have any other suggestions to solve the > issue or make sure that maker cleans up /tmp as aggressively as possible. > > > > I am using maker3-beta. > > > > Thanks for any help, > > Ben > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- _____________________________________________________ Benjamin ER Rubin, PhD Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmainzer at life.illinois.edu Mon Jan 9 00:02:01 2017 From: lmainzer at life.illinois.edu (Liudmila Sergeevna Mainzer) Date: Mon, 9 Jan 2017 01:02:01 -0600 Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names Message-ID: Hello, MAKER developers! I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here. I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER. I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files: head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html
     Tandem Repeats Finder Program written by:
                   Gary Benson
                   Program in Bioinformatics
                   Boston University
     Version 4.09
     Sequence: InputSequencefrag-1 CHUNK number:191 
     size:455659  offset:57300000
     
     Parameters: 2 3 5 75 20 33 7

etcetera
But also the second chunk:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
 
InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html target
     ="explanation">Alignment explanation

Indices: 56278--56322 Score: 55 Period size: 1 Copynumber: 45.0 Consensus size: 1 etcetera See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked. So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable. Once again, this only arises on very long file names, and those seem to be created by MAKER. Example: LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well... So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names. Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole. We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi. Many thanks in advance, Liudmila Mainzer ---------------- Senior Research Scientist National Center for Supercomputing Applications Research Assistant Professor Institute of Genomic Biology University of Illinois 217-300-0568 1205 W. Clark St. Room 4026 Urbana, IL 61801 From carsonhh at gmail.com Mon Jan 9 09:30:09 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 9 Jan 2017 09:30:09 -0700 Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names In-Reply-To: References: Message-ID: <733D5263-6CFC-4AB3-BFDD-30330B0E1985@gmail.com> The name used by maker is based off of the input file name, so quick fix would just be to rename your input file to have a shorter name. ?Carson > On Jan 9, 2017, at 12:02 AM, Liudmila Sergeevna Mainzer wrote: > > Hello, MAKER developers! > > I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here. > > I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER. > > I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files: > > head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html > InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html > bgcolor="#File 1 of 2 FBF8BC">
>    Tandem Repeats Finder Program written by:
>                  Gary Benson
>                  Program in Bioinformatics
>                  Boston University
>    Version 4.09
>    Sequence: InputSequencefrag-1 CHUNK number:191 
>    size:455659  offset:57300000
>    
>    Parameters: 2 3 5 75 20 33 7
> 
> etcetera
> But also the second chunk:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
> InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html 
>    bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
>        HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
>     target
>    ="explanation">Alignment explanation

> Indices: 56278--56322 Score: 55 > Period size: 1 Copynumber: 45.0 Consensus size: 1 > > etcetera > > > See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked. > > So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable. > > Once again, this only arises on very long file names, and those seem to be created by MAKER. Example: > LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific > > Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well... > > So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names. > > Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole. > We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi. > > Many thanks in advance, > Liudmila Mainzer > > ---------------- > Senior Research Scientist > National Center for Supercomputing Applications > > Research Assistant Professor > Institute of Genomic Biology > > University of Illinois > 217-300-0568 > 1205 W. Clark St. Room 4026 > Urbana, IL 61801 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qlian003 at ucr.edu Wed Jan 11 22:28:32 2017 From: qlian003 at ucr.edu (Qihua Liang) Date: Wed, 11 Jan 2017 21:28:32 -0800 Subject: [maker-devel] gff file: possible sources Message-ID: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> Hi Maker develop team! I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as, Possible Sources Include: BLASTN - BLASTN alignment of EST evidence BLASTX - BLASTX alignment of protein evidence TBLASTX - TBLASTX alignment of EST evidence from closely related organisms EST2Genome - Polished EST alignment from Exonerate Protein2Genome - Polished protein alignment from Exonerate SNAP - SNAP ab inito gene prediction GENEMARK - GeneMarkab inito gene prediction Augustus - Augustus ab inito gene prediction FgenesH - FGENESH ab inito gene prediction Repeatmasker - RepeatMasker identified repeat RepeatRunner - RepeatRunner identified repeat from the repeat protein database tRNAScan - tRNAScan-SE tRNA predictions (coming soon) PASA - PASA gene predictions (coming soon) There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above? Thanks Qihua -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Jan 12 06:28:24 2017 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 12 Jan 2017 13:28:24 +0000 Subject: [maker-devel] gff file: possible sources In-Reply-To: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> References: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> Message-ID: Hi Qihua, the cdna2genome is the polished tblastx alignments from Exonerate. Basically, the source column should be the name of the tool that generated the alignment, prediction, or gene model. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 11, 2017, at 11:28 PM, Qihua Liang > wrote: Hi Maker develop team! I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as, Possible Sources Include: * BLASTN - BLASTN alignment of EST evidence * BLASTX - BLASTX alignment of protein evidence * TBLASTX - TBLASTX alignment of EST evidence from closely related organisms * EST2Genome - Polished EST alignment from Exonerate * Protein2Genome - Polished protein alignment from Exonerate * SNAP - SNAP ab inito gene prediction * GENEMARK - GeneMarkab inito gene prediction * Augustus - Augustus ab inito gene prediction * FgenesH - FGENESH ab inito gene prediction * Repeatmasker - RepeatMasker identified repeat * RepeatRunner - RepeatRunner identified repeat from the repeat protein database * tRNAScan - tRNAScan-SE tRNA predictions (coming soon) * PASA - PASA gene predictions (coming soon) There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above? Thanks Qihua _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From patel.kumar.vipul at gmail.com Fri Jan 20 01:44:26 2017 From: patel.kumar.vipul at gmail.com (Vipul Patel) Date: Fri, 20 Jan 2017 09:44:26 +0100 Subject: [maker-devel] Maker crash for long chrm. Message-ID: Hi, I hope someone can help me to figure out what is actually going wrong. I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k rank=16, hostname=dummy ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:chr_test ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:chr_test examining contents of the fasta file and run log --Next Contig-- Processing run.log file... I got the same message if I run it without MPI, So I can guess it is not an MPI issue. How can I find out if some jobs died so maybe this could lead to this problem? Other ideas how I can tackle this problem? Kind regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From patel.kumar.vipul at gmail.com Fri Jan 20 06:34:28 2017 From: patel.kumar.vipul at gmail.com (Vipul Patel) Date: Fri, 20 Jan 2017 14:34:28 +0100 Subject: [maker-devel] Maker crash for long chrm. In-Reply-To: References: Message-ID: Solved. After some digging and printing I found out the problem. It was snap itself! For anybody who maybe runs in the same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. Kind regards 2017-01-20 9:44 GMT+01:00 Vipul Patel : > Hi, > > I hope someone can help me to figure out what is actually going wrong. > > I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP > variable not to use NFS. The given testcase as well for 1k 1MB runs without any problems. > > Applying it to a sequence, for example with 57MB it failes, I tried it as > well with a different sequences around 60MB, same outcome. > > I looked into the logs, but it was not really helpful as it was just > stated that the job failed > > It crashed with following message: > > deleted:0 genes > substr outside of string at /usr/share/perl/5.18/Carp.pm line 165. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Calling translate without a seq argument! > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/ > Bio/Root/Root.pm:447 > STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/ > Bio/Tools/CodonTable.pm:419 > STACK: CGL::TranslationMachine::longest_translation_plus_stop > programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280 > STACK: maker::auto_annotator::get_translation_seq > programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 > STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../ > lib/Widget/snap.pm:974 > STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/ > snap.pm:690 > STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194 > STACK: Process::MpiChunk::_go programs/maker/maker/bin/../ > lib/Process/MpiChunk.pm:1469 > STACK: Process::MpiChunk::run programs/maker/maker/bin/../ > lib/Process/MpiChunk.pm:341 > STACK: programs/maker/maker/bin/maker:979 > ----------------------------------------------------------- > --> rank=16, hostname=dummy > ERROR: Failed while gathering ab-init output files > ERROR: Chunk failed at level:1, tier_type:2 > FAILED CONTIG:chr_test > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:chr_test > > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > > I got the same message if I run it without MPI, So I can guess it is not > an MPI issue. > How can I find out if some jobs died so maybe this could lead to this > problem? > Other ideas how I can tackle this problem? > > Kind regards > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 20 15:00:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 20 Jan 2017 15:00:49 -0700 Subject: [maker-devel] Maker crash for long chrm. In-Reply-To: References: Message-ID: <59841676-741F-496D-9E47-7750417033A4@gmail.com> I?m glad it?s working for you. Let us know if anything else comes up. ?Carson > On Jan 20, 2017, at 6:34 AM, Vipul Patel wrote: > > Solved. After some digging and printing I found out the problem. > > It was snap itself! > > For anybody who maybe runs in the same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. > > Kind regards > > 2017-01-20 9:44 GMT+01:00 Vipul Patel >: > Hi, > > I hope someone can help me to figure out what is actually going wrong. > > I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k > Applying it to a sequence, for example with 57MB it failes, I tried it as well with a different sequences around 60MB, same outcome. > > I looked into the logs, but it was not really helpful as it was just stated that the job failed > > It crashed with following message: > > deleted:0 genes > substr outside of string at /usr/share/perl/5.18/Carp.pm line 165. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Calling translate without a seq argument! > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447 > STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419 > STACK: CGL::TranslationMachine::longest_translation_plus_stop programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280 > STACK: maker::auto_annotator::get_translation_seq programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 > STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/snap.pm:974 > STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/snap.pm:690 > STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194 > STACK: Process::MpiChunk::_go programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469 > STACK: Process::MpiChunk::run programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: programs/maker/maker/bin/maker:979 > ----------------------------------------------------------- > --> rank=16, hostname=dummy > ERROR: Failed while gathering ab-init output files > ERROR: Chunk failed at level:1, tier_type:2 > FAILED CONTIG:chr_test > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:chr_test > > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > > I got the same message if I run it without MPI, So I can guess it is not an MPI issue. > How can I find out if some jobs died so maybe this could lead to this problem? > Other ideas how I can tackle this problem? > > Kind regards > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mayabritstein at gmail.com Mon Jan 23 01:30:40 2017 From: mayabritstein at gmail.com (Maya Britstein) Date: Mon, 23 Jan 2017 10:30:40 +0200 Subject: [maker-devel] Authorization failed. Message-ID: Hi, I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work. thanks, Maya -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Mon Jan 23 05:43:53 2017 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon, 23 Jan 2017 12:43:53 +0000 Subject: [maker-devel] Authorization failed. In-Reply-To: References: Message-ID: Hi Maya, If you follow the link below you will find at the bottom of the page a portion of the form that allows you to reset your password. It?s a little misleading because it looks like it?s only an ?Unsubscribe? option, but it also takes you to a page that allows you to update your subscription details including password reminder/reset. The actual text for the portion of the page you?re looking for is this: 'To unsubscribe from maker-devel, get a password reminder, or change your subscription options enter your subscription email address:' The linke is: http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Regards, Barry On Jan 23, 2017, at 1:30 AM, Maya Britstein > wrote: Hi, I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work. thanks, Maya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Tue Jan 24 07:06:22 2017 From: daren.card at gmail.com (Daren C. Card) Date: Tue, 24 Jan 2017 08:06:22 -0600 Subject: [maker-devel] Maker error: Invalid nucleotide Message-ID: Hi everyone, I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig: ? Widget::augustus: /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus #-------------------------------# /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR Invalid nucleotide '8' encountered. /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR Invalid nucleotide '8' encountered. ERROR: Augustus failed --> rank=7, hostname=moonunit0 ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2 FAILED CONTIG:scaffold-92 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold-92 examining contents of the fasta file and run log ? Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues. Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary. Thanks, Daren Card UT-Arlington From carsonhh at gmail.com Wed Jan 25 10:37:50 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Jan 2017 10:37:50 -0700 Subject: [maker-devel] Maker error: Invalid nucleotide In-Reply-To: References: Message-ID: <5E13AB7E-9175-4440-AD62-A53BD9DD8DE1@gmail.com> Try running the contig in question (scaffold-92) as a separate MAKER run. That may haelp indicate if the issue may be a corrupt intermediate file (if it is, you can set clean_try=1 to force deletion of intermediate files before rerun). ?Carson > On Jan 24, 2017, at 7:06 AM, Daren C. Card wrote: > > Hi everyone, > > I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig: > > ? > Widget::augustus: > /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus > #-------------------------------# > > /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR > Invalid nucleotide '8' encountered. > > > /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR > Invalid nucleotide '8' encountered. > > ERROR: Augustus failed > --> rank=7, hostname=moonunit0 > ERROR: Failed while preparing ab-inits > ERROR: Chunk failed at level:0, tier_type:2 > FAILED CONTIG:scaffold-92 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:scaffold-92 > > examining contents of the fasta file and run log > ? > > Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues. > > Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary. > > Thanks, > Daren Card > > UT-Arlington > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From scott at scottcain.net Wed Jan 25 13:23:02 2017 From: scott at scottcain.net (Scott Cain) Date: Wed, 25 Jan 2017 15:23:02 -0500 Subject: [maker-devel] GFF3 file format In-Reply-To: References: Message-ID: Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein wrote: > Hi, > > I have RNA-seq data, and genomic data that I want to annotate using maker. > > From what I understood, I need to genarate a gff3 file format from the > RNA-seq mapping sequences. I had mapped the RNA sequences to the genome > using bowtie and tophat. However, I still do not know how to take these > format and convert them to a gff3 file that I can them use in maker as > annotation evidence > > I saw the wiki page, that did not mention how to make this conversion ( > http://gmod.org/wiki/GFF3) > > Can you please help me? > > Sincerely, > Maya > > ---- > Maya Britstein > Ph.D candidate > Laura Steindler's Lab > Marine Biology Department > Leon H. Charney School of Marine Sciences > University of Haifa, Israel > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Jan 25 15:03:51 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 25 Jan 2017 22:03:51 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: References: Message-ID: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly. But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence. chris From: maker-devel > on behalf of Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" >, "help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: Hi, I have RNA-seq data, and genomic data that I want to annotate using maker. From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3) Can you please help me? Sincerely, Maya ---- Maya Britstein Ph.D candidate Laura Steindler's Lab Marine Biology Department Leon H. Charney School of Marine Sciences University of Haifa, Israel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Jan 26 13:26:42 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 26 Jan 2017 15:26:42 -0500 Subject: [maker-devel] canonical protein sequences or isoform? Message-ID: Hello: I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Fri Jan 27 03:31:40 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Fri, 27 Jan 2017 11:31:40 +0100 Subject: [maker-devel] Maker-Error when started with OpenMPI Message-ID: Hi everybody. My name is Rainer. I am an administrator for our HPC-Systems at our university in Konstanz, Baden-Wuertemberg/Germany. The procect is called bwHPC-C5. See: https://www.bwhpc-c5.de/en/index.php I try to get Maker running on our bwUniCluster since weeks. Unfortunately i get errors while running a Maker job in the MPI-environment. BUILD STATUS ============================================================================== STATUS MAKER v2.31.9 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: ENABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK MODULES / INCLUDES / COMPILERS # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9 # ##### (B) Dependencies: # # conflict: any other maker version # module load compiler/gnu/5.2 # module load mpi/openmpi/2.0-gnu-5.2 [...] MPI/MOAB SUBMIT [...] ### Queues ### #MSUB -q fat #MSUB -l nodes=1:ppn=16 #MSUB -l mem=20gb #MSUB -l walltime=50:00:00 # [...] echo " " echo "### Loading MAKER module:" echo " " module load bio/maker/2.31.9 [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } echo "MAKER_VERSION = $MAKER_VERSION" module list [...] echo " " echo "### Runing Maker example" echo " " export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so export OMPI_MCA_mpi_warn_on_fork=0 echo "LD_PRELOAD=${LD_PRELOAD}" # # "STATUS: Processing and indexing input FASTA files..." # mpiexec -mca btl ^openib -n 16 maker [...] E R R O R S ======= [...] LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... [uc1n338:113607] *** Process received signal *** [uc1n338:113607] Signal: Segmentation fault (11) [uc1n338:113607] Signal code: Address not mapped (1) [uc1n338:113607] Failing at address: 0x4b0 [uc1n338:113608] *** Process received signal *** [uc1n338:113608] Signal: Segmentation fault (11) [uc1n338:113608] Signal code: Address not mapped (1) [uc1n338:113608] Failing at address: 0x4b0 [uc1n338:113621] *** Process received signal *** [uc1n338:113621] Signal: Segmentation fault (11) [uc1n338:113621] Signal code: Address not mapped (1) [uc1n338:113621] Failing at address: 0x4b0 -------------------------------------------------------------------------- mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- [...] WHATS WRONG HERE!? Thank you for your help! All the best , Rainer -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From michael.s.campbell1 at gmail.com Fri Jan 27 08:36:11 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 27 Jan 2017 10:36:11 -0500 Subject: [maker-devel] canonical protein sequences or isoform? In-Reply-To: References: Message-ID: I give MAKER all isoforms as evidence. Mike > On Jan 26, 2017, at 3:26 PM, Quanwei Zhang wrote: > > Hello: > > I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms? > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Fri Jan 27 09:13:22 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 27 Jan 2017 11:13:22 -0500 Subject: [maker-devel] transcript assembly of RNA-seq data Message-ID: Hello: I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. (1) De novo assembly without mapping to any genome assembly (like Trinity)? (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 27 09:23:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Jan 2017 09:23:40 -0700 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: References: Message-ID: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> (1) De novo assembly without mapping to any genome assembly (like Trinity) You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. ?Carson > On Jan 27, 2017, at 9:13 AM, Quanwei Zhang wrote: > > Hello: > > I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. > (1) De novo assembly without mapping to any genome assembly (like Trinity)? > (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? > (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri Jan 27 15:21:15 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jan 2017 22:21:15 +0000 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> References: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> Message-ID: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> Yup I agree. Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used? chris From: maker-devel > on behalf of Carson Holt > Date: Friday, January 27, 2017 at 10:23 AM To: Quanwei Zhang > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] transcript assembly of RNA-seq data (1) De novo assembly without mapping to any genome assembly (like Trinity) You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. ?Carson On Jan 27, 2017, at 9:13 AM, Quanwei Zhang > wrote: Hello: I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. (1) De novo assembly without mapping to any genome assembly (like Trinity)? (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? Thanks Best Quanwei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 27 17:53:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Jan 2017 17:53:10 -0700 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> References: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> Message-ID: No. My experience has just been with regular Trinity de novo assembly. Of course, I?d be interested in any one else?s attempt at this though. ?Carson > On Jan 27, 2017, at 3:21 PM, Fields, Christopher J wrote: > > Yup I agree. Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used? > > chris > > From: maker-devel > on behalf of Carson Holt > > Date: Friday, January 27, 2017 at 10:23 AM > To: Quanwei Zhang > > Cc: "maker-devel at yandell-lab.org " > > Subject: Re: [maker-devel] transcript assembly of RNA-seq data > >> (1) De novo assembly without mapping to any genome assembly (like Trinity) >> >> You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. >> >> ?Carson >> >> >>> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang > wrote: >>> >>> Hello: >>> >>> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. >>> (1) De novo assembly without mapping to any genome assembly (like Trinity)? >>> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? >>> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Jan 28 13:53:45 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 28 Jan 2017 13:53:45 -0700 Subject: [maker-devel] Maker-Error when started with OpenMPI In-Reply-To: References: Message-ID: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> Try adding one of the following to your mpiexec command ?> 1. --mca btl ^openib 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0 One or the other may fix your issue. The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter. --Carson > On Jan 27, 2017, at 3:31 AM, Rainer Rutka wrote: > > Hi everybody. > > My name is Rainer. I am an administrator for our HPC-Systems at our > university in Konstanz, Baden-Wuertemberg/Germany. > The procect is called bwHPC-C5. > > See: https://www.bwhpc-c5.de/en/index.php > > I try to get Maker running on our bwUniCluster since weeks. Unfortunately > i get errors while running a Maker job in the MPI-environment. > > BUILD STATUS > > ============================================================================== > STATUS MAKER v2.31.9 > ============================================================================== > PERL Dependencies: VERIFIED > External Programs: VERIFIED > External C Libraries: VERIFIED > MPI SUPPORT: ENABLED > MWAS Web Interface: DISABLED > MAKER PACKAGE: CONFIGURATION OK > > MODULES / INCLUDES / COMPILERS > > # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9 > # > ##### (B) Dependencies: > # > # conflict: any other maker version > # module load compiler/gnu/5.2 > # module load mpi/openmpi/2.0-gnu-5.2 > [...] > > MPI/MOAB SUBMIT > > [...] > ### Queues ### > #MSUB -q fat > #MSUB -l nodes=1:ppn=16 > #MSUB -l mem=20gb > #MSUB -l walltime=50:00:00 > # > [...] > echo " " > echo "### Loading MAKER module:" > echo " " > module load bio/maker/2.31.9 > [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } > echo "MAKER_VERSION = $MAKER_VERSION" > module list > [...] > echo " " > echo "### Runing Maker example" > echo " " > export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so > export OMPI_MCA_mpi_warn_on_fork=0 > > echo "LD_PRELOAD=${LD_PRELOAD}" > # > # "STATUS: Processing and indexing input FASTA files..." > # > mpiexec -mca btl ^openib -n 16 maker > [...] > > > E R R O R S > ======= > [...] > LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > [uc1n338:113607] *** Process received signal *** > [uc1n338:113607] Signal: Segmentation fault (11) > [uc1n338:113607] Signal code: Address not mapped (1) > [uc1n338:113607] Failing at address: 0x4b0 > [uc1n338:113608] *** Process received signal *** > [uc1n338:113608] Signal: Segmentation fault (11) > [uc1n338:113608] Signal code: Address not mapped (1) > [uc1n338:113608] Failing at address: 0x4b0 > [uc1n338:113621] *** Process received signal *** > [uc1n338:113621] Signal: Segmentation fault (11) > [uc1n338:113621] Signal code: Address not mapped (1) > [uc1n338:113621] Failing at address: 0x4b0 > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > [...] > > WHATS WRONG HERE!? > > Thank you for your help! > > All the best , > > Rainer > > -- > Rainer Rutka > University of Konstanz > Communication, Information, Media Centre (KIM) > * High-Performance-Computing (HPC) > * KIM-Support and -Base-Services > Room: V511 > 78457 Konstanz, Germany > +49 7531 88-5413 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From rainer.rutka at uni-konstanz.de Mon Jan 30 01:32:08 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Mon, 30 Jan 2017 09:32:08 +0100 Subject: [maker-devel] Maker-Error when started with OpenMPI In-Reply-To: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> References: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> Message-ID: Hi Carson! Thank you VERY MUCH for your hints. Much appreciated! I'll test these today and let you know about the results. Again: THANKS! :-) BTW: I'm not a scientist. Only a system operator. :-) Am 28.01.2017 um 21:53 schrieb Carson Holt: > Try adding one of the following to your mpiexec command ?> > 1. --mca btl ^openib > 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 > 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0 > One or the other may fix your issue. The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter. > --Carson -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From qwzhang0601 at gmail.com Tue Jan 31 10:36:13 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 12:36:13 -0500 Subject: [maker-devel] collecting protein sequences as evidences Message-ID: I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Jan 31 12:08:21 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 14:08:21 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals Message-ID: Hello: I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Jan 31 12:26:29 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 31 Jan 2017 14:26:29 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: Message-ID: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Tue Jan 31 13:57:28 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 31 Jan 2017 15:57:28 -0500 Subject: [maker-devel] collecting protein sequences as evidences In-Reply-To: References: Message-ID: <2E4D90C9-6D6E-4F52-A361-AFB06A61D2C2@gmail.com> Hi Quanwei, (1) When I use uniprot I use SWISS-prot and not tremble. (2) I don?t merge files together. I just pass them all to MAKER as a comma separated list. Thanks, Mike > On Jan 31, 2017, at 12:36 PM, Quanwei Zhang wrote: > > I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. > (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). > (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Jan 31 14:05:43 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 21:05:43 +0000 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. chris On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" wrote: I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mjfi2sb3 at gmail.com Tue Jan 31 14:14:14 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Tue, 31 Jan 2017 21:14:14 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> References: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> Message-ID: Hi Christopher, How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one? Best /SB On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, wrote: > If I recall, from a BAM you would need to run a reference-based assembly > on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use > Trinity for ref-based assembly. But I always choose the route of a full de > novo assembly (again, Trinity or similar) when possible, doing some basic > cleanup (e.g. remove low confidence transcripts) and bring them as EST > evidence. > > chris > > From: maker-devel on behalf of > Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM > To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" , " > help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format > > Hi Maya, > > I'm not sure what MAKER's requirements are in this regard--I'm forwarding > this to their mailing list. > > Scott > > > On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: > > Hi, > > I have RNA-seq data, and genomic data that I want to annotate using maker. > > From what I understood, I need to genarate a gff3 file format from the > RNA-seq mapping sequences. I had mapped the RNA sequences to the genome > using bowtie and tophat. However, I still do not know how to take these > format and convert them to a gff3 file that I can them use in maker as > annotation evidence > > I saw the wiki page, that did not mention how to make this conversion ( > http://gmod.org/wiki/GFF3 > > ) > > Can you please help me? > > Sincerely, > Maya > > ---- > Maya Britstein > Ph.D candidate > Laura Steindler's Lab > Marine Biology Department > Leon H. Charney School of Marine Sciences > University of Haifa, Israel > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/ > ) > 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Jan 31 14:33:12 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 16:33:12 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. Best Quanwei 2017-01-31 16:05 GMT-05:00 Fields, Christopher J : > I agree with Mike. I also suggest not combining RNA-Seqs from different > runs (e.g. different studies) even if they are from the same tissue, > development stage etc. There are many other factors (biological variation, > sample quality, sequencing chemistry or technology differences, etc) that > can significantly and negatively impact trx assembly quality. > > chris > > On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" < > maker-devel-bounces at yandell-lab.org on behalf of > michael.s.campbell1 at gmail.com> wrote: > > I would probably try merging the replicates but not the tissues. You > can then pass the output files to MAKER in a comma separated list in the > opts file. > > Example: > est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta > > Good luck, > Mike > > > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > I am trying to assemble transcripts using RNA-seq data by the tool > Trinity, which will be used for gene annotation for Maker. Now I have data > from two tissues with two replicates each. Should I merge all four samples > to get one assembly file? Or should I merge replicates of each tissue > separately and use the two assembly files as input of Maker. Merging all > samples into one, we will have much higher coverage level, but I think > there may be some genes expressed by tissue-specific isoforms. So I not > sure whether I should merge RNA-seq from different tissues. > > What's more, I find some published RNA-seq data from another > individual (and also for different tissue from us) for the same species. > Should I merge all RNA-seq together (across individuals and tissues)? Or > should I generate different transcript assembly and use all those > assemblies as input to Maker? > > > > Thanks > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_ > yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_ > yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 31 14:35:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 31 Jan 2017 14:35:20 -0700 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list. ?Carson > On Jan 31, 2017, at 2:33 PM, Quanwei Zhang wrote: > > Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. > > Best > Quanwei > > 2017-01-31 16:05 GMT-05:00 Fields, Christopher J >: > I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. > > chris > > On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" on behalf of michael.s.campbell1 at gmail.com > wrote: > > I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. > > Example: > est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta > > Good luck, > Mike > > > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > > > Thanks > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Jan 31 16:05:43 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 23:05:43 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: References: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> Message-ID: <8BD384C9-4E46-42AC-A59F-96299EF5E104@illinois.edu> You can use RSEM for some initial filtering: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts Then I generally use the Trinity QA steps, in particular TransRate or DETONATE: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment chris From: Salim Bougouffa Date: Tuesday, January 31, 2017 at 3:14 PM To: Chris Fields , Scott Cain , Maya Britstein Cc: "maker-devel at yandell-lab.org List" , "help at gmod.org" Subject: Re: [maker-devel] GFF3 file format Hi Christopher, How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one? Best /SB On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, > wrote: If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly. But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence. chris From: maker-devel > on behalf of Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" >, "help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: Hi, I have RNA-seq data, and genomic data that I want to annotate using maker. From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3) Can you please help me? Sincerely, Maya ---- Maya Britstein Ph.D candidate Laura Steindler's Lab Marine Biology Department Leon H. Charney School of Marine Sciences University of Haifa, Israel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Jan 31 16:07:44 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 23:07:44 +0000 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> Message-ID: Exactly chris From: Carson Holt Date: Tuesday, January 31, 2017 at 3:35 PM To: Quanwei Zhang Cc: Chris Fields , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list. ?Carson On Jan 31, 2017, at 2:33 PM, Quanwei Zhang > wrote: Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. Best Quanwei 2017-01-31 16:05 GMT-05:00 Fields, Christopher J >: I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. chris On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" on behalf of michael.s.campbell1 at gmail.com> wrote: I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Wed Jan 4 23:41:25 2017 From: rob.syme at gmail.com (Rob Syme) Date: Thu, 05 Jan 2017 06:41:25 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts Message-ID: Hi all The MAKER wiki page "Repeat Library Construction - Advanced " describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? Thanks! Rob Syme Research Associate Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From olegl at volcani.agri.gov.il Thu Jan 5 03:07:31 2017 From: olegl at volcani.agri.gov.il (Oleg Lovky) Date: Thu, 5 Jan 2017 10:07:31 +0000 Subject: [maker-devel] Unable to train SNAP Message-ID: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Hello, I'm running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000). However, I'm not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file. Furthermore, when trying to use maker2zff I'm getting empty genome.ann and genome.dna files. Please advise. Regards, Oleg Lovky, MSc. Research Engineer Institute of Plant Sciences ARO, Volcani Center Cell: 054-4870319 [v95_15] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 16191 bytes Desc: image001.png URL: From michael.s.campbell1 at gmail.com Thu Jan 5 07:54:17 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 5 Jan 2017 09:54:17 -0500 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: <3B3F80CA-BFA1-4F0E-A2F1-CA60E8496D5F@gmail.com> Hi Rob, There is a link near the bottom of that wiki page at the end of this line "CRL and other custom scripts are available here.? That points to this URL http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz Thanks, Mike > On Jan 5, 2017, at 1:41 AM, Rob Syme wrote: > > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced " describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Thu Jan 5 18:29:35 2017 From: rob.syme at gmail.com (Rob Syme) Date: Fri, 06 Jan 2017 01:29:35 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: Oh dear. That's embarrassing for me! Sorry for the silly question. -r On Thu, 5 Jan 2017 at 14:41 Rob Syme wrote: > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced > " > describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded > MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. > Are they distributed with MAKER or separately. Does anybody know where to > find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu Jan 5 19:23:17 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Fri, 6 Jan 2017 13:23:17 +1100 Subject: [maker-devel] Unable to train SNAP In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Message-ID: Are you using the -n option with maker2zff? You often get empty genome.ann and genome.dna files if you don't. On 5 January 2017 at 21:07, Oleg Lovky wrote: > Hello, > > > > I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) > containing ~50k reads (length ranges from 70 to 12000). > > However, I?m not getting transcript and proteins fasta files at all, > despite Maker not giving any errors and everything is listed as finished in > the datastore log file. > > Furthermore, when trying to use maker2zff I?m getting empty genome.ann and > genome.dna files. > > > > Please advise. > > > > Regards, > > > > Oleg Lovky, MSc. > > Research Engineer > > Institute of Plant Sciences > > ARO, Volcani Center > > Cell: 054-4870319 > > [image: v95_15] > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 16191 bytes Desc: not available URL: From carsonhh at gmail.com Fri Jan 6 12:28:02 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Jan 2017 12:28:02 -0700 Subject: [maker-devel] Unable to train SNAP In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Message-ID: <8F65E561-7450-4B5A-8F1B-4E51C0D25BE2@gmail.com> The maker2zff script has a number of thresholds that must be reached to avoid filtering all models. If you don?t have protein evidence in the dataset for example, then that filter may always be failing. You may just want to turn all filters off with the -n option as previously suggested. ?Carson > On Jan 5, 2017, at 3:07 AM, Oleg Lovky wrote: > > Hello, > > I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000). > However, I?m not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file. > Furthermore, when trying to use maker2zff I?m getting empty genome.ann and genome.dna files. > > Please advise. > > Regards, > > Oleg Lovky, MSc. > Research Engineer > Institute of Plant Sciences > ARO, Volcani Center > Cell: 054-4870319 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at msu.edu Thu Jan 5 07:28:00 2017 From: kchilds at msu.edu (Childs, Kevin) Date: Thu, 5 Jan 2017 14:28:00 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: <6AE4044B-9011-4421-A6F1-FE3B95BBB11D@msu.edu> Rob, The scripts can be found in a link at the bottom of this wiki page: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced Kevin Childs --- Kevin Childs, PhD Assistant Professor - Fixed Term Center for Genomics-Enabled Plant Science Plant Biology Department Michigan State University kchilds at msu.edu 517-775-2844 (m) 517-884-6926 (o) http://childslab.plantbiology.msu.edu > On Jan 5, 2017, at 1:41 AM, Rob Syme wrote: > > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Fri Jan 6 18:22:10 2017 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Fri, 6 Jan 2017 20:22:10 -0500 Subject: [maker-devel] /tmp full Message-ID: Hi all, Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible. I am using maker3-beta. Thanks for any help, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Jan 7 16:29:29 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 7 Jan 2017 16:29:29 -0700 Subject: [maker-devel] /tmp full In-Reply-To: References: Message-ID: If you use the MPI settings, then all processes will share a single temporary directory, otherwise they each will have a separate one since they can?t intercommunicate. MAKER tries to cleanup its files on finish or failure, but if you or the system kill it with certain signals, then it is reaped immediately by the system and not allowed to finish cleaning up. Signals 9 and 19 for example will do that. If a failure is related to the drive being full or a memory issue, then your system may be hitting it with one of these uncatchable signals. For example SLURM may use signal 9 or 19 if a process fails to respond to signal 15 in a timely manner (i.e. MAKER may be removing files, but SLURM gets impatient and kills it more aggressively because it thinks the process is not responding). You can always try and empty /tmp as the first step in your batch script, and it will remove files belonging to you before launching MAKER. ?Carson > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin wrote: > > Hi all, > > Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. > > I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible. > > I am using maker3-beta. > > Thanks for any help, > Ben > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Sun Jan 8 09:24:36 2017 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Sun, 8 Jan 2017 11:24:36 -0500 Subject: [maker-devel] /tmp full In-Reply-To: References: Message-ID: OK, thanks for the tips. Knowing the particulars of how SLURM might be causing this is extremely helpful. I'll try to just empty /tmp before running MAKER on each node, as you suggest. I suspect that will work but will work on getting MPI running as well. Thanks! Ben On Sat, Jan 7, 2017 at 6:29 PM, Carson Holt wrote: > If you use the MPI settings, then all processes will share a single > temporary directory, otherwise they each will have a separate one since > they can?t intercommunicate. > > MAKER tries to cleanup its files on finish or failure, but if you or the > system kill it with certain signals, then it is reaped immediately by the > system and not allowed to finish cleaning up. Signals 9 and 19 for example > will do that. If a failure is related to the drive being full or a memory > issue, then your system may be hitting it with one of these uncatchable > signals. For example SLURM may use signal 9 or 19 if a process fails to > respond to signal 15 in a timely manner (i.e. MAKER may be removing files, > but SLURM gets impatient and kills it more aggressively because it thinks > the process is not responding). You can always try and empty /tmp as the > first step in your batch script, and it will remove files belonging to you > before launching MAKER. > > ?Carson > > > > > > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin > wrote: > > > > Hi all, > > > > Maker keeps filling up the /tmp directories on the cluster I am using. > It appears that most of the space is taken with many versions of various > blast databases. I suspect that this issue is partly due to my not using > MPI and instead launching multiple instances of maker (typically 16) in the > same working directory. However, it appears that maker is also leaving some > of these databases in /tmp even after it has died or been killed and they > are piling up. > > > > I am submitting my jobs to the cluster via SLURM but have installed > maker locally rather than system-wide. My system administrator is going to > try creating a larger locally mounted directory on some of the nodes for me > but I wanted to check to see if you have any other suggestions to solve the > issue or make sure that maker cleans up /tmp as aggressively as possible. > > > > I am using maker3-beta. > > > > Thanks for any help, > > Ben > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- _____________________________________________________ Benjamin ER Rubin, PhD Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmainzer at life.illinois.edu Mon Jan 9 00:02:01 2017 From: lmainzer at life.illinois.edu (Liudmila Sergeevna Mainzer) Date: Mon, 9 Jan 2017 01:02:01 -0600 Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names Message-ID: Hello, MAKER developers! I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here. I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER. I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files: head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html
     Tandem Repeats Finder Program written by:
                   Gary Benson
                   Program in Bioinformatics
                   Boston University
     Version 4.09
     Sequence: InputSequencefrag-1 CHUNK number:191 
     size:455659  offset:57300000
     
     Parameters: 2 3 5 75 20 33 7

etcetera
But also the second chunk:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
 
InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html target
     ="explanation">Alignment explanation

Indices: 56278--56322 Score: 55 Period size: 1 Copynumber: 45.0 Consensus size: 1 etcetera See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked. So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable. Once again, this only arises on very long file names, and those seem to be created by MAKER. Example: LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well... So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names. Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole. We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi. Many thanks in advance, Liudmila Mainzer ---------------- Senior Research Scientist National Center for Supercomputing Applications Research Assistant Professor Institute of Genomic Biology University of Illinois 217-300-0568 1205 W. Clark St. Room 4026 Urbana, IL 61801 From carsonhh at gmail.com Mon Jan 9 09:30:09 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 9 Jan 2017 09:30:09 -0700 Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names In-Reply-To: References: Message-ID: <733D5263-6CFC-4AB3-BFDD-30330B0E1985@gmail.com> The name used by maker is based off of the input file name, so quick fix would just be to rename your input file to have a shorter name. ?Carson > On Jan 9, 2017, at 12:02 AM, Liudmila Sergeevna Mainzer wrote: > > Hello, MAKER developers! > > I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here. > > I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER. > > I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files: > > head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html > InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html > bgcolor="#File 1 of 2 FBF8BC">
>    Tandem Repeats Finder Program written by:
>                  Gary Benson
>                  Program in Bioinformatics
>                  Boston University
>    Version 4.09
>    Sequence: InputSequencefrag-1 CHUNK number:191 
>    size:455659  offset:57300000
>    
>    Parameters: 2 3 5 75 20 33 7
> 
> etcetera
> But also the second chunk:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
> InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html 
>    bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
>        HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
>     target
>    ="explanation">Alignment explanation

> Indices: 56278--56322 Score: 55 > Period size: 1 Copynumber: 45.0 Consensus size: 1 > > etcetera > > > See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked. > > So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable. > > Once again, this only arises on very long file names, and those seem to be created by MAKER. Example: > LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific > > Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well... > > So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names. > > Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole. > We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi. > > Many thanks in advance, > Liudmila Mainzer > > ---------------- > Senior Research Scientist > National Center for Supercomputing Applications > > Research Assistant Professor > Institute of Genomic Biology > > University of Illinois > 217-300-0568 > 1205 W. Clark St. Room 4026 > Urbana, IL 61801 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qlian003 at ucr.edu Wed Jan 11 22:28:32 2017 From: qlian003 at ucr.edu (Qihua Liang) Date: Wed, 11 Jan 2017 21:28:32 -0800 Subject: [maker-devel] gff file: possible sources Message-ID: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> Hi Maker develop team! I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as, Possible Sources Include: BLASTN - BLASTN alignment of EST evidence BLASTX - BLASTX alignment of protein evidence TBLASTX - TBLASTX alignment of EST evidence from closely related organisms EST2Genome - Polished EST alignment from Exonerate Protein2Genome - Polished protein alignment from Exonerate SNAP - SNAP ab inito gene prediction GENEMARK - GeneMarkab inito gene prediction Augustus - Augustus ab inito gene prediction FgenesH - FGENESH ab inito gene prediction Repeatmasker - RepeatMasker identified repeat RepeatRunner - RepeatRunner identified repeat from the repeat protein database tRNAScan - tRNAScan-SE tRNA predictions (coming soon) PASA - PASA gene predictions (coming soon) There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above? Thanks Qihua -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Jan 12 06:28:24 2017 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 12 Jan 2017 13:28:24 +0000 Subject: [maker-devel] gff file: possible sources In-Reply-To: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> References: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> Message-ID: Hi Qihua, the cdna2genome is the polished tblastx alignments from Exonerate. Basically, the source column should be the name of the tool that generated the alignment, prediction, or gene model. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 11, 2017, at 11:28 PM, Qihua Liang > wrote: Hi Maker develop team! I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as, Possible Sources Include: * BLASTN - BLASTN alignment of EST evidence * BLASTX - BLASTX alignment of protein evidence * TBLASTX - TBLASTX alignment of EST evidence from closely related organisms * EST2Genome - Polished EST alignment from Exonerate * Protein2Genome - Polished protein alignment from Exonerate * SNAP - SNAP ab inito gene prediction * GENEMARK - GeneMarkab inito gene prediction * Augustus - Augustus ab inito gene prediction * FgenesH - FGENESH ab inito gene prediction * Repeatmasker - RepeatMasker identified repeat * RepeatRunner - RepeatRunner identified repeat from the repeat protein database * tRNAScan - tRNAScan-SE tRNA predictions (coming soon) * PASA - PASA gene predictions (coming soon) There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above? Thanks Qihua _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From patel.kumar.vipul at gmail.com Fri Jan 20 01:44:26 2017 From: patel.kumar.vipul at gmail.com (Vipul Patel) Date: Fri, 20 Jan 2017 09:44:26 +0100 Subject: [maker-devel] Maker crash for long chrm. Message-ID: Hi, I hope someone can help me to figure out what is actually going wrong. I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k rank=16, hostname=dummy ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:chr_test ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:chr_test examining contents of the fasta file and run log --Next Contig-- Processing run.log file... I got the same message if I run it without MPI, So I can guess it is not an MPI issue. How can I find out if some jobs died so maybe this could lead to this problem? Other ideas how I can tackle this problem? Kind regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From patel.kumar.vipul at gmail.com Fri Jan 20 06:34:28 2017 From: patel.kumar.vipul at gmail.com (Vipul Patel) Date: Fri, 20 Jan 2017 14:34:28 +0100 Subject: [maker-devel] Maker crash for long chrm. In-Reply-To: References: Message-ID: Solved. After some digging and printing I found out the problem. It was snap itself! For anybody who maybe runs in the same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. Kind regards 2017-01-20 9:44 GMT+01:00 Vipul Patel : > Hi, > > I hope someone can help me to figure out what is actually going wrong. > > I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP > variable not to use NFS. The given testcase as well for 1k 1MB runs without any problems. > > Applying it to a sequence, for example with 57MB it failes, I tried it as > well with a different sequences around 60MB, same outcome. > > I looked into the logs, but it was not really helpful as it was just > stated that the job failed > > It crashed with following message: > > deleted:0 genes > substr outside of string at /usr/share/perl/5.18/Carp.pm line 165. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Calling translate without a seq argument! > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/ > Bio/Root/Root.pm:447 > STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/ > Bio/Tools/CodonTable.pm:419 > STACK: CGL::TranslationMachine::longest_translation_plus_stop > programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280 > STACK: maker::auto_annotator::get_translation_seq > programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 > STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../ > lib/Widget/snap.pm:974 > STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/ > snap.pm:690 > STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194 > STACK: Process::MpiChunk::_go programs/maker/maker/bin/../ > lib/Process/MpiChunk.pm:1469 > STACK: Process::MpiChunk::run programs/maker/maker/bin/../ > lib/Process/MpiChunk.pm:341 > STACK: programs/maker/maker/bin/maker:979 > ----------------------------------------------------------- > --> rank=16, hostname=dummy > ERROR: Failed while gathering ab-init output files > ERROR: Chunk failed at level:1, tier_type:2 > FAILED CONTIG:chr_test > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:chr_test > > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > > I got the same message if I run it without MPI, So I can guess it is not > an MPI issue. > How can I find out if some jobs died so maybe this could lead to this > problem? > Other ideas how I can tackle this problem? > > Kind regards > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 20 15:00:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 20 Jan 2017 15:00:49 -0700 Subject: [maker-devel] Maker crash for long chrm. In-Reply-To: References: Message-ID: <59841676-741F-496D-9E47-7750417033A4@gmail.com> I?m glad it?s working for you. Let us know if anything else comes up. ?Carson > On Jan 20, 2017, at 6:34 AM, Vipul Patel wrote: > > Solved. After some digging and printing I found out the problem. > > It was snap itself! > > For anybody who maybe runs in the same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. > > Kind regards > > 2017-01-20 9:44 GMT+01:00 Vipul Patel >: > Hi, > > I hope someone can help me to figure out what is actually going wrong. > > I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k > Applying it to a sequence, for example with 57MB it failes, I tried it as well with a different sequences around 60MB, same outcome. > > I looked into the logs, but it was not really helpful as it was just stated that the job failed > > It crashed with following message: > > deleted:0 genes > substr outside of string at /usr/share/perl/5.18/Carp.pm line 165. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Calling translate without a seq argument! > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447 > STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419 > STACK: CGL::TranslationMachine::longest_translation_plus_stop programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280 > STACK: maker::auto_annotator::get_translation_seq programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 > STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/snap.pm:974 > STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/snap.pm:690 > STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194 > STACK: Process::MpiChunk::_go programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469 > STACK: Process::MpiChunk::run programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: programs/maker/maker/bin/maker:979 > ----------------------------------------------------------- > --> rank=16, hostname=dummy > ERROR: Failed while gathering ab-init output files > ERROR: Chunk failed at level:1, tier_type:2 > FAILED CONTIG:chr_test > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:chr_test > > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > > I got the same message if I run it without MPI, So I can guess it is not an MPI issue. > How can I find out if some jobs died so maybe this could lead to this problem? > Other ideas how I can tackle this problem? > > Kind regards > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mayabritstein at gmail.com Mon Jan 23 01:30:40 2017 From: mayabritstein at gmail.com (Maya Britstein) Date: Mon, 23 Jan 2017 10:30:40 +0200 Subject: [maker-devel] Authorization failed. Message-ID: Hi, I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work. thanks, Maya -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Mon Jan 23 05:43:53 2017 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon, 23 Jan 2017 12:43:53 +0000 Subject: [maker-devel] Authorization failed. In-Reply-To: References: Message-ID: Hi Maya, If you follow the link below you will find at the bottom of the page a portion of the form that allows you to reset your password. It?s a little misleading because it looks like it?s only an ?Unsubscribe? option, but it also takes you to a page that allows you to update your subscription details including password reminder/reset. The actual text for the portion of the page you?re looking for is this: 'To unsubscribe from maker-devel, get a password reminder, or change your subscription options enter your subscription email address:' The linke is: http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Regards, Barry On Jan 23, 2017, at 1:30 AM, Maya Britstein > wrote: Hi, I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work. thanks, Maya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Tue Jan 24 07:06:22 2017 From: daren.card at gmail.com (Daren C. Card) Date: Tue, 24 Jan 2017 08:06:22 -0600 Subject: [maker-devel] Maker error: Invalid nucleotide Message-ID: Hi everyone, I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig: ? Widget::augustus: /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus #-------------------------------# /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR Invalid nucleotide '8' encountered. /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR Invalid nucleotide '8' encountered. ERROR: Augustus failed --> rank=7, hostname=moonunit0 ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2 FAILED CONTIG:scaffold-92 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold-92 examining contents of the fasta file and run log ? Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues. Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary. Thanks, Daren Card UT-Arlington From carsonhh at gmail.com Wed Jan 25 10:37:50 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Jan 2017 10:37:50 -0700 Subject: [maker-devel] Maker error: Invalid nucleotide In-Reply-To: References: Message-ID: <5E13AB7E-9175-4440-AD62-A53BD9DD8DE1@gmail.com> Try running the contig in question (scaffold-92) as a separate MAKER run. That may haelp indicate if the issue may be a corrupt intermediate file (if it is, you can set clean_try=1 to force deletion of intermediate files before rerun). ?Carson > On Jan 24, 2017, at 7:06 AM, Daren C. Card wrote: > > Hi everyone, > > I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig: > > ? > Widget::augustus: > /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus > #-------------------------------# > > /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR > Invalid nucleotide '8' encountered. > > > /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR > Invalid nucleotide '8' encountered. > > ERROR: Augustus failed > --> rank=7, hostname=moonunit0 > ERROR: Failed while preparing ab-inits > ERROR: Chunk failed at level:0, tier_type:2 > FAILED CONTIG:scaffold-92 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:scaffold-92 > > examining contents of the fasta file and run log > ? > > Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues. > > Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary. > > Thanks, > Daren Card > > UT-Arlington > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From scott at scottcain.net Wed Jan 25 13:23:02 2017 From: scott at scottcain.net (Scott Cain) Date: Wed, 25 Jan 2017 15:23:02 -0500 Subject: [maker-devel] GFF3 file format In-Reply-To: References: Message-ID: Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein wrote: > Hi, > > I have RNA-seq data, and genomic data that I want to annotate using maker. > > From what I understood, I need to genarate a gff3 file format from the > RNA-seq mapping sequences. I had mapped the RNA sequences to the genome > using bowtie and tophat. However, I still do not know how to take these > format and convert them to a gff3 file that I can them use in maker as > annotation evidence > > I saw the wiki page, that did not mention how to make this conversion ( > http://gmod.org/wiki/GFF3) > > Can you please help me? > > Sincerely, > Maya > > ---- > Maya Britstein > Ph.D candidate > Laura Steindler's Lab > Marine Biology Department > Leon H. Charney School of Marine Sciences > University of Haifa, Israel > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Jan 25 15:03:51 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 25 Jan 2017 22:03:51 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: References: Message-ID: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly. But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence. chris From: maker-devel > on behalf of Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" >, "help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: Hi, I have RNA-seq data, and genomic data that I want to annotate using maker. From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3) Can you please help me? Sincerely, Maya ---- Maya Britstein Ph.D candidate Laura Steindler's Lab Marine Biology Department Leon H. Charney School of Marine Sciences University of Haifa, Israel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Jan 26 13:26:42 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 26 Jan 2017 15:26:42 -0500 Subject: [maker-devel] canonical protein sequences or isoform? Message-ID: Hello: I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Fri Jan 27 03:31:40 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Fri, 27 Jan 2017 11:31:40 +0100 Subject: [maker-devel] Maker-Error when started with OpenMPI Message-ID: Hi everybody. My name is Rainer. I am an administrator for our HPC-Systems at our university in Konstanz, Baden-Wuertemberg/Germany. The procect is called bwHPC-C5. See: https://www.bwhpc-c5.de/en/index.php I try to get Maker running on our bwUniCluster since weeks. Unfortunately i get errors while running a Maker job in the MPI-environment. BUILD STATUS ============================================================================== STATUS MAKER v2.31.9 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: ENABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK MODULES / INCLUDES / COMPILERS # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9 # ##### (B) Dependencies: # # conflict: any other maker version # module load compiler/gnu/5.2 # module load mpi/openmpi/2.0-gnu-5.2 [...] MPI/MOAB SUBMIT [...] ### Queues ### #MSUB -q fat #MSUB -l nodes=1:ppn=16 #MSUB -l mem=20gb #MSUB -l walltime=50:00:00 # [...] echo " " echo "### Loading MAKER module:" echo " " module load bio/maker/2.31.9 [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } echo "MAKER_VERSION = $MAKER_VERSION" module list [...] echo " " echo "### Runing Maker example" echo " " export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so export OMPI_MCA_mpi_warn_on_fork=0 echo "LD_PRELOAD=${LD_PRELOAD}" # # "STATUS: Processing and indexing input FASTA files..." # mpiexec -mca btl ^openib -n 16 maker [...] E R R O R S ======= [...] LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... [uc1n338:113607] *** Process received signal *** [uc1n338:113607] Signal: Segmentation fault (11) [uc1n338:113607] Signal code: Address not mapped (1) [uc1n338:113607] Failing at address: 0x4b0 [uc1n338:113608] *** Process received signal *** [uc1n338:113608] Signal: Segmentation fault (11) [uc1n338:113608] Signal code: Address not mapped (1) [uc1n338:113608] Failing at address: 0x4b0 [uc1n338:113621] *** Process received signal *** [uc1n338:113621] Signal: Segmentation fault (11) [uc1n338:113621] Signal code: Address not mapped (1) [uc1n338:113621] Failing at address: 0x4b0 -------------------------------------------------------------------------- mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- [...] WHATS WRONG HERE!? Thank you for your help! All the best , Rainer -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From michael.s.campbell1 at gmail.com Fri Jan 27 08:36:11 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 27 Jan 2017 10:36:11 -0500 Subject: [maker-devel] canonical protein sequences or isoform? In-Reply-To: References: Message-ID: I give MAKER all isoforms as evidence. Mike > On Jan 26, 2017, at 3:26 PM, Quanwei Zhang wrote: > > Hello: > > I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms? > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Fri Jan 27 09:13:22 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 27 Jan 2017 11:13:22 -0500 Subject: [maker-devel] transcript assembly of RNA-seq data Message-ID: Hello: I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. (1) De novo assembly without mapping to any genome assembly (like Trinity)? (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 27 09:23:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Jan 2017 09:23:40 -0700 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: References: Message-ID: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> (1) De novo assembly without mapping to any genome assembly (like Trinity) You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. ?Carson > On Jan 27, 2017, at 9:13 AM, Quanwei Zhang wrote: > > Hello: > > I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. > (1) De novo assembly without mapping to any genome assembly (like Trinity)? > (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? > (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri Jan 27 15:21:15 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jan 2017 22:21:15 +0000 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> References: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> Message-ID: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> Yup I agree. Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used? chris From: maker-devel > on behalf of Carson Holt > Date: Friday, January 27, 2017 at 10:23 AM To: Quanwei Zhang > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] transcript assembly of RNA-seq data (1) De novo assembly without mapping to any genome assembly (like Trinity) You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. ?Carson On Jan 27, 2017, at 9:13 AM, Quanwei Zhang > wrote: Hello: I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. (1) De novo assembly without mapping to any genome assembly (like Trinity)? (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? Thanks Best Quanwei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 27 17:53:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Jan 2017 17:53:10 -0700 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> References: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> Message-ID: No. My experience has just been with regular Trinity de novo assembly. Of course, I?d be interested in any one else?s attempt at this though. ?Carson > On Jan 27, 2017, at 3:21 PM, Fields, Christopher J wrote: > > Yup I agree. Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used? > > chris > > From: maker-devel > on behalf of Carson Holt > > Date: Friday, January 27, 2017 at 10:23 AM > To: Quanwei Zhang > > Cc: "maker-devel at yandell-lab.org " > > Subject: Re: [maker-devel] transcript assembly of RNA-seq data > >> (1) De novo assembly without mapping to any genome assembly (like Trinity) >> >> You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. >> >> ?Carson >> >> >>> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang > wrote: >>> >>> Hello: >>> >>> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. >>> (1) De novo assembly without mapping to any genome assembly (like Trinity)? >>> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? >>> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Jan 28 13:53:45 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 28 Jan 2017 13:53:45 -0700 Subject: [maker-devel] Maker-Error when started with OpenMPI In-Reply-To: References: Message-ID: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> Try adding one of the following to your mpiexec command ?> 1. --mca btl ^openib 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0 One or the other may fix your issue. The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter. --Carson > On Jan 27, 2017, at 3:31 AM, Rainer Rutka wrote: > > Hi everybody. > > My name is Rainer. I am an administrator for our HPC-Systems at our > university in Konstanz, Baden-Wuertemberg/Germany. > The procect is called bwHPC-C5. > > See: https://www.bwhpc-c5.de/en/index.php > > I try to get Maker running on our bwUniCluster since weeks. Unfortunately > i get errors while running a Maker job in the MPI-environment. > > BUILD STATUS > > ============================================================================== > STATUS MAKER v2.31.9 > ============================================================================== > PERL Dependencies: VERIFIED > External Programs: VERIFIED > External C Libraries: VERIFIED > MPI SUPPORT: ENABLED > MWAS Web Interface: DISABLED > MAKER PACKAGE: CONFIGURATION OK > > MODULES / INCLUDES / COMPILERS > > # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9 > # > ##### (B) Dependencies: > # > # conflict: any other maker version > # module load compiler/gnu/5.2 > # module load mpi/openmpi/2.0-gnu-5.2 > [...] > > MPI/MOAB SUBMIT > > [...] > ### Queues ### > #MSUB -q fat > #MSUB -l nodes=1:ppn=16 > #MSUB -l mem=20gb > #MSUB -l walltime=50:00:00 > # > [...] > echo " " > echo "### Loading MAKER module:" > echo " " > module load bio/maker/2.31.9 > [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } > echo "MAKER_VERSION = $MAKER_VERSION" > module list > [...] > echo " " > echo "### Runing Maker example" > echo " " > export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so > export OMPI_MCA_mpi_warn_on_fork=0 > > echo "LD_PRELOAD=${LD_PRELOAD}" > # > # "STATUS: Processing and indexing input FASTA files..." > # > mpiexec -mca btl ^openib -n 16 maker > [...] > > > E R R O R S > ======= > [...] > LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > [uc1n338:113607] *** Process received signal *** > [uc1n338:113607] Signal: Segmentation fault (11) > [uc1n338:113607] Signal code: Address not mapped (1) > [uc1n338:113607] Failing at address: 0x4b0 > [uc1n338:113608] *** Process received signal *** > [uc1n338:113608] Signal: Segmentation fault (11) > [uc1n338:113608] Signal code: Address not mapped (1) > [uc1n338:113608] Failing at address: 0x4b0 > [uc1n338:113621] *** Process received signal *** > [uc1n338:113621] Signal: Segmentation fault (11) > [uc1n338:113621] Signal code: Address not mapped (1) > [uc1n338:113621] Failing at address: 0x4b0 > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > [...] > > WHATS WRONG HERE!? > > Thank you for your help! > > All the best , > > Rainer > > -- > Rainer Rutka > University of Konstanz > Communication, Information, Media Centre (KIM) > * High-Performance-Computing (HPC) > * KIM-Support and -Base-Services > Room: V511 > 78457 Konstanz, Germany > +49 7531 88-5413 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From rainer.rutka at uni-konstanz.de Mon Jan 30 01:32:08 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Mon, 30 Jan 2017 09:32:08 +0100 Subject: [maker-devel] Maker-Error when started with OpenMPI In-Reply-To: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> References: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> Message-ID: Hi Carson! Thank you VERY MUCH for your hints. Much appreciated! I'll test these today and let you know about the results. Again: THANKS! :-) BTW: I'm not a scientist. Only a system operator. :-) Am 28.01.2017 um 21:53 schrieb Carson Holt: > Try adding one of the following to your mpiexec command ?> > 1. --mca btl ^openib > 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 > 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0 > One or the other may fix your issue. The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter. > --Carson -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From qwzhang0601 at gmail.com Tue Jan 31 10:36:13 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 12:36:13 -0500 Subject: [maker-devel] collecting protein sequences as evidences Message-ID: I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Jan 31 12:08:21 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 14:08:21 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals Message-ID: Hello: I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Jan 31 12:26:29 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 31 Jan 2017 14:26:29 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: Message-ID: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Tue Jan 31 13:57:28 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 31 Jan 2017 15:57:28 -0500 Subject: [maker-devel] collecting protein sequences as evidences In-Reply-To: References: Message-ID: <2E4D90C9-6D6E-4F52-A361-AFB06A61D2C2@gmail.com> Hi Quanwei, (1) When I use uniprot I use SWISS-prot and not tremble. (2) I don?t merge files together. I just pass them all to MAKER as a comma separated list. Thanks, Mike > On Jan 31, 2017, at 12:36 PM, Quanwei Zhang wrote: > > I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. > (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). > (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Jan 31 14:05:43 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 21:05:43 +0000 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. chris On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" wrote: I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mjfi2sb3 at gmail.com Tue Jan 31 14:14:14 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Tue, 31 Jan 2017 21:14:14 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> References: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> Message-ID: Hi Christopher, How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one? Best /SB On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, wrote: > If I recall, from a BAM you would need to run a reference-based assembly > on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use > Trinity for ref-based assembly. But I always choose the route of a full de > novo assembly (again, Trinity or similar) when possible, doing some basic > cleanup (e.g. remove low confidence transcripts) and bring them as EST > evidence. > > chris > > From: maker-devel on behalf of > Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM > To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" , " > help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format > > Hi Maya, > > I'm not sure what MAKER's requirements are in this regard--I'm forwarding > this to their mailing list. > > Scott > > > On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: > > Hi, > > I have RNA-seq data, and genomic data that I want to annotate using maker. > > From what I understood, I need to genarate a gff3 file format from the > RNA-seq mapping sequences. I had mapped the RNA sequences to the genome > using bowtie and tophat. However, I still do not know how to take these > format and convert them to a gff3 file that I can them use in maker as > annotation evidence > > I saw the wiki page, that did not mention how to make this conversion ( > http://gmod.org/wiki/GFF3 > > ) > > Can you please help me? > > Sincerely, > Maya > > ---- > Maya Britstein > Ph.D candidate > Laura Steindler's Lab > Marine Biology Department > Leon H. Charney School of Marine Sciences > University of Haifa, Israel > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/ > ) > 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Jan 31 14:33:12 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 16:33:12 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. Best Quanwei 2017-01-31 16:05 GMT-05:00 Fields, Christopher J : > I agree with Mike. I also suggest not combining RNA-Seqs from different > runs (e.g. different studies) even if they are from the same tissue, > development stage etc. There are many other factors (biological variation, > sample quality, sequencing chemistry or technology differences, etc) that > can significantly and negatively impact trx assembly quality. > > chris > > On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" < > maker-devel-bounces at yandell-lab.org on behalf of > michael.s.campbell1 at gmail.com> wrote: > > I would probably try merging the replicates but not the tissues. You > can then pass the output files to MAKER in a comma separated list in the > opts file. > > Example: > est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta > > Good luck, > Mike > > > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > I am trying to assemble transcripts using RNA-seq data by the tool > Trinity, which will be used for gene annotation for Maker. Now I have data > from two tissues with two replicates each. Should I merge all four samples > to get one assembly file? Or should I merge replicates of each tissue > separately and use the two assembly files as input of Maker. Merging all > samples into one, we will have much higher coverage level, but I think > there may be some genes expressed by tissue-specific isoforms. So I not > sure whether I should merge RNA-seq from different tissues. > > What's more, I find some published RNA-seq data from another > individual (and also for different tissue from us) for the same species. > Should I merge all RNA-seq together (across individuals and tissues)? Or > should I generate different transcript assembly and use all those > assemblies as input to Maker? > > > > Thanks > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_ > yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_ > yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 31 14:35:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 31 Jan 2017 14:35:20 -0700 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list. ?Carson > On Jan 31, 2017, at 2:33 PM, Quanwei Zhang wrote: > > Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. > > Best > Quanwei > > 2017-01-31 16:05 GMT-05:00 Fields, Christopher J >: > I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. > > chris > > On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" on behalf of michael.s.campbell1 at gmail.com > wrote: > > I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. > > Example: > est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta > > Good luck, > Mike > > > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > > > Thanks > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Jan 31 16:05:43 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 23:05:43 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: References: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> Message-ID: <8BD384C9-4E46-42AC-A59F-96299EF5E104@illinois.edu> You can use RSEM for some initial filtering: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts Then I generally use the Trinity QA steps, in particular TransRate or DETONATE: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment chris From: Salim Bougouffa Date: Tuesday, January 31, 2017 at 3:14 PM To: Chris Fields , Scott Cain , Maya Britstein Cc: "maker-devel at yandell-lab.org List" , "help at gmod.org" Subject: Re: [maker-devel] GFF3 file format Hi Christopher, How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one? Best /SB On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, > wrote: If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly. But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence. chris From: maker-devel > on behalf of Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" >, "help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: Hi, I have RNA-seq data, and genomic data that I want to annotate using maker. From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3) Can you please help me? Sincerely, Maya ---- Maya Britstein Ph.D candidate Laura Steindler's Lab Marine Biology Department Leon H. Charney School of Marine Sciences University of Haifa, Israel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Jan 31 16:07:44 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 23:07:44 +0000 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> Message-ID: Exactly chris From: Carson Holt Date: Tuesday, January 31, 2017 at 3:35 PM To: Quanwei Zhang Cc: Chris Fields , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list. ?Carson On Jan 31, 2017, at 2:33 PM, Quanwei Zhang > wrote: Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. Best Quanwei 2017-01-31 16:05 GMT-05:00 Fields, Christopher J >: I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. chris On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" on behalf of michael.s.campbell1 at gmail.com> wrote: I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Wed Jan 4 23:41:25 2017 From: rob.syme at gmail.com (Rob Syme) Date: Thu, 05 Jan 2017 06:41:25 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts Message-ID: Hi all The MAKER wiki page "Repeat Library Construction - Advanced " describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? Thanks! Rob Syme Research Associate Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From olegl at volcani.agri.gov.il Thu Jan 5 03:07:31 2017 From: olegl at volcani.agri.gov.il (Oleg Lovky) Date: Thu, 5 Jan 2017 10:07:31 +0000 Subject: [maker-devel] Unable to train SNAP Message-ID: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Hello, I'm running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000). However, I'm not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file. Furthermore, when trying to use maker2zff I'm getting empty genome.ann and genome.dna files. Please advise. Regards, Oleg Lovky, MSc. Research Engineer Institute of Plant Sciences ARO, Volcani Center Cell: 054-4870319 [v95_15] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 16191 bytes Desc: image001.png URL: From michael.s.campbell1 at gmail.com Thu Jan 5 07:54:17 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 5 Jan 2017 09:54:17 -0500 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: <3B3F80CA-BFA1-4F0E-A2F1-CA60E8496D5F@gmail.com> Hi Rob, There is a link near the bottom of that wiki page at the end of this line "CRL and other custom scripts are available here.? That points to this URL http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz Thanks, Mike > On Jan 5, 2017, at 1:41 AM, Rob Syme wrote: > > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced " describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Thu Jan 5 18:29:35 2017 From: rob.syme at gmail.com (Rob Syme) Date: Fri, 06 Jan 2017 01:29:35 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: Oh dear. That's embarrassing for me! Sorry for the silly question. -r On Thu, 5 Jan 2017 at 14:41 Rob Syme wrote: > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced > " > describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded > MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. > Are they distributed with MAKER or separately. Does anybody know where to > find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu Jan 5 19:23:17 2017 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Fri, 6 Jan 2017 13:23:17 +1100 Subject: [maker-devel] Unable to train SNAP In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Message-ID: Are you using the -n option with maker2zff? You often get empty genome.ann and genome.dna files if you don't. On 5 January 2017 at 21:07, Oleg Lovky wrote: > Hello, > > > > I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) > containing ~50k reads (length ranges from 70 to 12000). > > However, I?m not getting transcript and proteins fasta files at all, > despite Maker not giving any errors and everything is listed as finished in > the datastore log file. > > Furthermore, when trying to use maker2zff I?m getting empty genome.ann and > genome.dna files. > > > > Please advise. > > > > Regards, > > > > Oleg Lovky, MSc. > > Research Engineer > > Institute of Plant Sciences > > ARO, Volcani Center > > Cell: 054-4870319 > > [image: v95_15] > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 16191 bytes Desc: not available URL: From carsonhh at gmail.com Fri Jan 6 12:28:02 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Jan 2017 12:28:02 -0700 Subject: [maker-devel] Unable to train SNAP In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local> Message-ID: <8F65E561-7450-4B5A-8F1B-4E51C0D25BE2@gmail.com> The maker2zff script has a number of thresholds that must be reached to avoid filtering all models. If you don?t have protein evidence in the dataset for example, then that filter may always be failing. You may just want to turn all filters off with the -n option as previously suggested. ?Carson > On Jan 5, 2017, at 3:07 AM, Oleg Lovky wrote: > > Hello, > > I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000). > However, I?m not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file. > Furthermore, when trying to use maker2zff I?m getting empty genome.ann and genome.dna files. > > Please advise. > > Regards, > > Oleg Lovky, MSc. > Research Engineer > Institute of Plant Sciences > ARO, Volcani Center > Cell: 054-4870319 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at msu.edu Thu Jan 5 07:28:00 2017 From: kchilds at msu.edu (Childs, Kevin) Date: Thu, 5 Jan 2017 14:28:00 +0000 Subject: [maker-devel] Repeat library construction - CRL scripts In-Reply-To: References: Message-ID: <6AE4044B-9011-4421-A6F1-FE3B95BBB11D@msu.edu> Rob, The scripts can be found in a link at the bottom of this wiki page: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced Kevin Childs --- Kevin Childs, PhD Assistant Professor - Fixed Term Center for Genomics-Enabled Plant Science Plant Biology Department Michigan State University kchilds at msu.edu 517-775-2844 (m) 517-884-6926 (o) http://childslab.plantbiology.msu.edu > On Jan 5, 2017, at 1:41 AM, Rob Syme wrote: > > Hi all > > The MAKER wiki page "Repeat Library Construction - Advanced" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them? > > Thanks! > > Rob Syme > Research Associate > Curtin University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Fri Jan 6 18:22:10 2017 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Fri, 6 Jan 2017 20:22:10 -0500 Subject: [maker-devel] /tmp full Message-ID: Hi all, Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible. I am using maker3-beta. Thanks for any help, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Jan 7 16:29:29 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 7 Jan 2017 16:29:29 -0700 Subject: [maker-devel] /tmp full In-Reply-To: References: Message-ID: If you use the MPI settings, then all processes will share a single temporary directory, otherwise they each will have a separate one since they can?t intercommunicate. MAKER tries to cleanup its files on finish or failure, but if you or the system kill it with certain signals, then it is reaped immediately by the system and not allowed to finish cleaning up. Signals 9 and 19 for example will do that. If a failure is related to the drive being full or a memory issue, then your system may be hitting it with one of these uncatchable signals. For example SLURM may use signal 9 or 19 if a process fails to respond to signal 15 in a timely manner (i.e. MAKER may be removing files, but SLURM gets impatient and kills it more aggressively because it thinks the process is not responding). You can always try and empty /tmp as the first step in your batch script, and it will remove files belonging to you before launching MAKER. ?Carson > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin wrote: > > Hi all, > > Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. > > I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible. > > I am using maker3-beta. > > Thanks for any help, > Ben > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From brubin at fieldmuseum.org Sun Jan 8 09:24:36 2017 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Sun, 8 Jan 2017 11:24:36 -0500 Subject: [maker-devel] /tmp full In-Reply-To: References: Message-ID: OK, thanks for the tips. Knowing the particulars of how SLURM might be causing this is extremely helpful. I'll try to just empty /tmp before running MAKER on each node, as you suggest. I suspect that will work but will work on getting MPI running as well. Thanks! Ben On Sat, Jan 7, 2017 at 6:29 PM, Carson Holt wrote: > If you use the MPI settings, then all processes will share a single > temporary directory, otherwise they each will have a separate one since > they can?t intercommunicate. > > MAKER tries to cleanup its files on finish or failure, but if you or the > system kill it with certain signals, then it is reaped immediately by the > system and not allowed to finish cleaning up. Signals 9 and 19 for example > will do that. If a failure is related to the drive being full or a memory > issue, then your system may be hitting it with one of these uncatchable > signals. For example SLURM may use signal 9 or 19 if a process fails to > respond to signal 15 in a timely manner (i.e. MAKER may be removing files, > but SLURM gets impatient and kills it more aggressively because it thinks > the process is not responding). You can always try and empty /tmp as the > first step in your batch script, and it will remove files belonging to you > before launching MAKER. > > ?Carson > > > > > > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin > wrote: > > > > Hi all, > > > > Maker keeps filling up the /tmp directories on the cluster I am using. > It appears that most of the space is taken with many versions of various > blast databases. I suspect that this issue is partly due to my not using > MPI and instead launching multiple instances of maker (typically 16) in the > same working directory. However, it appears that maker is also leaving some > of these databases in /tmp even after it has died or been killed and they > are piling up. > > > > I am submitting my jobs to the cluster via SLURM but have installed > maker locally rather than system-wide. My system administrator is going to > try creating a larger locally mounted directory on some of the nodes for me > but I wanted to check to see if you have any other suggestions to solve the > issue or make sure that maker cleans up /tmp as aggressively as possible. > > > > I am using maker3-beta. > > > > Thanks for any help, > > Ben > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- _____________________________________________________ Benjamin ER Rubin, PhD Committee on Evolutionary Biology University of Chicago benrubin.org Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmainzer at life.illinois.edu Mon Jan 9 00:02:01 2017 From: lmainzer at life.illinois.edu (Liudmila Sergeevna Mainzer) Date: Mon, 9 Jan 2017 01:02:01 -0600 Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names Message-ID: Hello, MAKER developers! I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here. I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER. I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files: head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html
     Tandem Repeats Finder Program written by:
                   Gary Benson
                   Program in Bioinformatics
                   Boston University
     Version 4.09
     Sequence: InputSequencefrag-1 CHUNK number:191 
     size:455659  offset:57300000
     
     Parameters: 2 3 5 75 20 33 7

etcetera
But also the second chunk:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
 
InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html target
     ="explanation">Alignment explanation

Indices: 56278--56322 Score: 55 Period size: 1 Copynumber: 45.0 Consensus size: 1 etcetera See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked. So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable. Once again, this only arises on very long file names, and those seem to be created by MAKER. Example: LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well... So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names. Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole. We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi. Many thanks in advance, Liudmila Mainzer ---------------- Senior Research Scientist National Center for Supercomputing Applications Research Assistant Professor Institute of Genomic Biology University of Illinois 217-300-0568 1205 W. Clark St. Room 4026 Urbana, IL 61801 From carsonhh at gmail.com Mon Jan 9 09:30:09 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 9 Jan 2017 09:30:09 -0700 Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names In-Reply-To: References: Message-ID: <733D5263-6CFC-4AB3-BFDD-30330B0E1985@gmail.com> The name used by maker is based off of the input file name, so quick fix would just be to rename your input file to have a shorter name. ?Carson > On Jan 9, 2017, at 12:02 AM, Liudmila Sergeevna Mainzer wrote: > > Hello, MAKER developers! > > I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here. > > I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER. > > I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files: > > head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html > InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html > bgcolor="#File 1 of 2 FBF8BC">
>    Tandem Repeats Finder Program written by:
>                  Gary Benson
>                  Program in Bioinformatics
>                  Boston University
>    Version 4.09
>    Sequence: InputSequencefrag-1 CHUNK number:191 
>    size:455659  offset:57300000
>    
>    Parameters: 2 3 5 75 20 33 7
> 
> etcetera
> But also the second chunk:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
> InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html 
>    bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
>        HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
>     target
>    ="explanation">Alignment explanation

> Indices: 56278--56322 Score: 55 > Period size: 1 Copynumber: 45.0 Consensus size: 1 > > etcetera > > > See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked. > > So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable. > > Once again, this only arises on very long file names, and those seem to be created by MAKER. Example: > LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific > > Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well... > > So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names. > > Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole. > We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi. > > Many thanks in advance, > Liudmila Mainzer > > ---------------- > Senior Research Scientist > National Center for Supercomputing Applications > > Research Assistant Professor > Institute of Genomic Biology > > University of Illinois > 217-300-0568 > 1205 W. Clark St. Room 4026 > Urbana, IL 61801 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qlian003 at ucr.edu Wed Jan 11 22:28:32 2017 From: qlian003 at ucr.edu (Qihua Liang) Date: Wed, 11 Jan 2017 21:28:32 -0800 Subject: [maker-devel] gff file: possible sources Message-ID: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> Hi Maker develop team! I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as, Possible Sources Include: BLASTN - BLASTN alignment of EST evidence BLASTX - BLASTX alignment of protein evidence TBLASTX - TBLASTX alignment of EST evidence from closely related organisms EST2Genome - Polished EST alignment from Exonerate Protein2Genome - Polished protein alignment from Exonerate SNAP - SNAP ab inito gene prediction GENEMARK - GeneMarkab inito gene prediction Augustus - Augustus ab inito gene prediction FgenesH - FGENESH ab inito gene prediction Repeatmasker - RepeatMasker identified repeat RepeatRunner - RepeatRunner identified repeat from the repeat protein database tRNAScan - tRNAScan-SE tRNA predictions (coming soon) PASA - PASA gene predictions (coming soon) There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above? Thanks Qihua -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Jan 12 06:28:24 2017 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 12 Jan 2017 13:28:24 +0000 Subject: [maker-devel] gff file: possible sources In-Reply-To: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> References: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu> Message-ID: Hi Qihua, the cdna2genome is the polished tblastx alignments from Exonerate. Basically, the source column should be the name of the tool that generated the alignment, prediction, or gene model. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 11, 2017, at 11:28 PM, Qihua Liang > wrote: Hi Maker develop team! I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as, Possible Sources Include: * BLASTN - BLASTN alignment of EST evidence * BLASTX - BLASTX alignment of protein evidence * TBLASTX - TBLASTX alignment of EST evidence from closely related organisms * EST2Genome - Polished EST alignment from Exonerate * Protein2Genome - Polished protein alignment from Exonerate * SNAP - SNAP ab inito gene prediction * GENEMARK - GeneMarkab inito gene prediction * Augustus - Augustus ab inito gene prediction * FgenesH - FGENESH ab inito gene prediction * Repeatmasker - RepeatMasker identified repeat * RepeatRunner - RepeatRunner identified repeat from the repeat protein database * tRNAScan - tRNAScan-SE tRNA predictions (coming soon) * PASA - PASA gene predictions (coming soon) There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above? Thanks Qihua _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From patel.kumar.vipul at gmail.com Fri Jan 20 01:44:26 2017 From: patel.kumar.vipul at gmail.com (Vipul Patel) Date: Fri, 20 Jan 2017 09:44:26 +0100 Subject: [maker-devel] Maker crash for long chrm. Message-ID: Hi, I hope someone can help me to figure out what is actually going wrong. I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k rank=16, hostname=dummy ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:chr_test ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:chr_test examining contents of the fasta file and run log --Next Contig-- Processing run.log file... I got the same message if I run it without MPI, So I can guess it is not an MPI issue. How can I find out if some jobs died so maybe this could lead to this problem? Other ideas how I can tackle this problem? Kind regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From patel.kumar.vipul at gmail.com Fri Jan 20 06:34:28 2017 From: patel.kumar.vipul at gmail.com (Vipul Patel) Date: Fri, 20 Jan 2017 14:34:28 +0100 Subject: [maker-devel] Maker crash for long chrm. In-Reply-To: References: Message-ID: Solved. After some digging and printing I found out the problem. It was snap itself! For anybody who maybe runs in the same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. Kind regards 2017-01-20 9:44 GMT+01:00 Vipul Patel : > Hi, > > I hope someone can help me to figure out what is actually going wrong. > > I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP > variable not to use NFS. The given testcase as well for 1k 1MB runs without any problems. > > Applying it to a sequence, for example with 57MB it failes, I tried it as > well with a different sequences around 60MB, same outcome. > > I looked into the logs, but it was not really helpful as it was just > stated that the job failed > > It crashed with following message: > > deleted:0 genes > substr outside of string at /usr/share/perl/5.18/Carp.pm line 165. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Calling translate without a seq argument! > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/ > Bio/Root/Root.pm:447 > STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/ > Bio/Tools/CodonTable.pm:419 > STACK: CGL::TranslationMachine::longest_translation_plus_stop > programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280 > STACK: maker::auto_annotator::get_translation_seq > programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 > STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../ > lib/Widget/snap.pm:974 > STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/ > snap.pm:690 > STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194 > STACK: Process::MpiChunk::_go programs/maker/maker/bin/../ > lib/Process/MpiChunk.pm:1469 > STACK: Process::MpiChunk::run programs/maker/maker/bin/../ > lib/Process/MpiChunk.pm:341 > STACK: programs/maker/maker/bin/maker:979 > ----------------------------------------------------------- > --> rank=16, hostname=dummy > ERROR: Failed while gathering ab-init output files > ERROR: Chunk failed at level:1, tier_type:2 > FAILED CONTIG:chr_test > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:chr_test > > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > > I got the same message if I run it without MPI, So I can guess it is not > an MPI issue. > How can I find out if some jobs died so maybe this could lead to this > problem? > Other ideas how I can tackle this problem? > > Kind regards > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 20 15:00:49 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 20 Jan 2017 15:00:49 -0700 Subject: [maker-devel] Maker crash for long chrm. In-Reply-To: References: Message-ID: <59841676-741F-496D-9E47-7750417033A4@gmail.com> I?m glad it?s working for you. Let us know if anything else comes up. ?Carson > On Jan 20, 2017, at 6:34 AM, Vipul Patel wrote: > > Solved. After some digging and printing I found out the problem. > > It was snap itself! > > For anybody who maybe runs in the same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. > > Kind regards > > 2017-01-20 9:44 GMT+01:00 Vipul Patel >: > Hi, > > I hope someone can help me to figure out what is actually going wrong. > > I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k > Applying it to a sequence, for example with 57MB it failes, I tried it as well with a different sequences around 60MB, same outcome. > > I looked into the logs, but it was not really helpful as it was just stated that the job failed > > It crashed with following message: > > deleted:0 genes > substr outside of string at /usr/share/perl/5.18/Carp.pm line 165. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Calling translate without a seq argument! > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447 > STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419 > STACK: CGL::TranslationMachine::longest_translation_plus_stop programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280 > STACK: maker::auto_annotator::get_translation_seq programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 > STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/snap.pm:974 > STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/snap.pm:690 > STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194 > STACK: Process::MpiChunk::_go programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469 > STACK: Process::MpiChunk::run programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: programs/maker/maker/bin/maker:979 > ----------------------------------------------------------- > --> rank=16, hostname=dummy > ERROR: Failed while gathering ab-init output files > ERROR: Chunk failed at level:1, tier_type:2 > FAILED CONTIG:chr_test > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:chr_test > > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > > I got the same message if I run it without MPI, So I can guess it is not an MPI issue. > How can I find out if some jobs died so maybe this could lead to this problem? > Other ideas how I can tackle this problem? > > Kind regards > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mayabritstein at gmail.com Mon Jan 23 01:30:40 2017 From: mayabritstein at gmail.com (Maya Britstein) Date: Mon, 23 Jan 2017 10:30:40 +0200 Subject: [maker-devel] Authorization failed. Message-ID: Hi, I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work. thanks, Maya -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Mon Jan 23 05:43:53 2017 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon, 23 Jan 2017 12:43:53 +0000 Subject: [maker-devel] Authorization failed. In-Reply-To: References: Message-ID: Hi Maya, If you follow the link below you will find at the bottom of the page a portion of the form that allows you to reset your password. It?s a little misleading because it looks like it?s only an ?Unsubscribe? option, but it also takes you to a page that allows you to update your subscription details including password reminder/reset. The actual text for the portion of the page you?re looking for is this: 'To unsubscribe from maker-devel, get a password reminder, or change your subscription options enter your subscription email address:' The linke is: http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Regards, Barry On Jan 23, 2017, at 1:30 AM, Maya Britstein > wrote: Hi, I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work. thanks, Maya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Tue Jan 24 07:06:22 2017 From: daren.card at gmail.com (Daren C. Card) Date: Tue, 24 Jan 2017 08:06:22 -0600 Subject: [maker-devel] Maker error: Invalid nucleotide Message-ID: Hi everyone, I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig: ? Widget::augustus: /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus #-------------------------------# /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR Invalid nucleotide '8' encountered. /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR Invalid nucleotide '8' encountered. ERROR: Augustus failed --> rank=7, hostname=moonunit0 ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2 FAILED CONTIG:scaffold-92 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold-92 examining contents of the fasta file and run log ? Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues. Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary. Thanks, Daren Card UT-Arlington From carsonhh at gmail.com Wed Jan 25 10:37:50 2017 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Jan 2017 10:37:50 -0700 Subject: [maker-devel] Maker error: Invalid nucleotide In-Reply-To: References: Message-ID: <5E13AB7E-9175-4440-AD62-A53BD9DD8DE1@gmail.com> Try running the contig in question (scaffold-92) as a separate MAKER run. That may haelp indicate if the issue may be a corrupt intermediate file (if it is, you can set clean_try=1 to force deletion of intermediate files before rerun). ?Carson > On Jan 24, 2017, at 7:06 AM, Daren C. Card wrote: > > Hi everyone, > > I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig: > > ? > Widget::augustus: > /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus > #-------------------------------# > > /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR > Invalid nucleotide '8' encountered. > > > /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR > Invalid nucleotide '8' encountered. > > ERROR: Augustus failed > --> rank=7, hostname=moonunit0 > ERROR: Failed while preparing ab-inits > ERROR: Chunk failed at level:0, tier_type:2 > FAILED CONTIG:scaffold-92 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:scaffold-92 > > examining contents of the fasta file and run log > ? > > Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues. > > Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary. > > Thanks, > Daren Card > > UT-Arlington > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From scott at scottcain.net Wed Jan 25 13:23:02 2017 From: scott at scottcain.net (Scott Cain) Date: Wed, 25 Jan 2017 15:23:02 -0500 Subject: [maker-devel] GFF3 file format In-Reply-To: References: Message-ID: Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein wrote: > Hi, > > I have RNA-seq data, and genomic data that I want to annotate using maker. > > From what I understood, I need to genarate a gff3 file format from the > RNA-seq mapping sequences. I had mapped the RNA sequences to the genome > using bowtie and tophat. However, I still do not know how to take these > format and convert them to a gff3 file that I can them use in maker as > annotation evidence > > I saw the wiki page, that did not mention how to make this conversion ( > http://gmod.org/wiki/GFF3) > > Can you please help me? > > Sincerely, > Maya > > ---- > Maya Britstein > Ph.D candidate > Laura Steindler's Lab > Marine Biology Department > Leon H. Charney School of Marine Sciences > University of Haifa, Israel > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Jan 25 15:03:51 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 25 Jan 2017 22:03:51 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: References: Message-ID: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly. But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence. chris From: maker-devel > on behalf of Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" >, "help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: Hi, I have RNA-seq data, and genomic data that I want to annotate using maker. From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3) Can you please help me? Sincerely, Maya ---- Maya Britstein Ph.D candidate Laura Steindler's Lab Marine Biology Department Leon H. Charney School of Marine Sciences University of Haifa, Israel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Jan 26 13:26:42 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 26 Jan 2017 15:26:42 -0500 Subject: [maker-devel] canonical protein sequences or isoform? Message-ID: Hello: I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.rutka at uni-konstanz.de Fri Jan 27 03:31:40 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Fri, 27 Jan 2017 11:31:40 +0100 Subject: [maker-devel] Maker-Error when started with OpenMPI Message-ID: Hi everybody. My name is Rainer. I am an administrator for our HPC-Systems at our university in Konstanz, Baden-Wuertemberg/Germany. The procect is called bwHPC-C5. See: https://www.bwhpc-c5.de/en/index.php I try to get Maker running on our bwUniCluster since weeks. Unfortunately i get errors while running a Maker job in the MPI-environment. BUILD STATUS ============================================================================== STATUS MAKER v2.31.9 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: ENABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK MODULES / INCLUDES / COMPILERS # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9 # ##### (B) Dependencies: # # conflict: any other maker version # module load compiler/gnu/5.2 # module load mpi/openmpi/2.0-gnu-5.2 [...] MPI/MOAB SUBMIT [...] ### Queues ### #MSUB -q fat #MSUB -l nodes=1:ppn=16 #MSUB -l mem=20gb #MSUB -l walltime=50:00:00 # [...] echo " " echo "### Loading MAKER module:" echo " " module load bio/maker/2.31.9 [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } echo "MAKER_VERSION = $MAKER_VERSION" module list [...] echo " " echo "### Runing Maker example" echo " " export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so export OMPI_MCA_mpi_warn_on_fork=0 echo "LD_PRELOAD=${LD_PRELOAD}" # # "STATUS: Processing and indexing input FASTA files..." # mpiexec -mca btl ^openib -n 16 maker [...] E R R O R S ======= [...] LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... [uc1n338:113607] *** Process received signal *** [uc1n338:113607] Signal: Segmentation fault (11) [uc1n338:113607] Signal code: Address not mapped (1) [uc1n338:113607] Failing at address: 0x4b0 [uc1n338:113608] *** Process received signal *** [uc1n338:113608] Signal: Segmentation fault (11) [uc1n338:113608] Signal code: Address not mapped (1) [uc1n338:113608] Failing at address: 0x4b0 [uc1n338:113621] *** Process received signal *** [uc1n338:113621] Signal: Segmentation fault (11) [uc1n338:113621] Signal code: Address not mapped (1) [uc1n338:113621] Failing at address: 0x4b0 -------------------------------------------------------------------------- mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- [...] WHATS WRONG HERE!? Thank you for your help! All the best , Rainer -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From michael.s.campbell1 at gmail.com Fri Jan 27 08:36:11 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 27 Jan 2017 10:36:11 -0500 Subject: [maker-devel] canonical protein sequences or isoform? In-Reply-To: References: Message-ID: I give MAKER all isoforms as evidence. Mike > On Jan 26, 2017, at 3:26 PM, Quanwei Zhang wrote: > > Hello: > > I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms? > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qwzhang0601 at gmail.com Fri Jan 27 09:13:22 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Fri, 27 Jan 2017 11:13:22 -0500 Subject: [maker-devel] transcript assembly of RNA-seq data Message-ID: Hello: I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. (1) De novo assembly without mapping to any genome assembly (like Trinity)? (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 27 09:23:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Jan 2017 09:23:40 -0700 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: References: Message-ID: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> (1) De novo assembly without mapping to any genome assembly (like Trinity) You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. ?Carson > On Jan 27, 2017, at 9:13 AM, Quanwei Zhang wrote: > > Hello: > > I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. > (1) De novo assembly without mapping to any genome assembly (like Trinity)? > (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? > (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? > > Thanks > > Best > Quanwei > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri Jan 27 15:21:15 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jan 2017 22:21:15 +0000 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> References: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> Message-ID: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> Yup I agree. Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used? chris From: maker-devel > on behalf of Carson Holt > Date: Friday, January 27, 2017 at 10:23 AM To: Quanwei Zhang > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] transcript assembly of RNA-seq data (1) De novo assembly without mapping to any genome assembly (like Trinity) You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. ?Carson On Jan 27, 2017, at 9:13 AM, Quanwei Zhang > wrote: Hello: I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. (1) De novo assembly without mapping to any genome assembly (like Trinity)? (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? Thanks Best Quanwei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 27 17:53:10 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Jan 2017 17:53:10 -0700 Subject: [maker-devel] transcript assembly of RNA-seq data In-Reply-To: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> References: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com> <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu> Message-ID: No. My experience has just been with regular Trinity de novo assembly. Of course, I?d be interested in any one else?s attempt at this though. ?Carson > On Jan 27, 2017, at 3:21 PM, Fields, Christopher J wrote: > > Yup I agree. Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used? > > chris > > From: maker-devel > on behalf of Carson Holt > > Date: Friday, January 27, 2017 at 10:23 AM > To: Quanwei Zhang > > Cc: "maker-devel at yandell-lab.org " > > Subject: Re: [maker-devel] transcript assembly of RNA-seq data > >> (1) De novo assembly without mapping to any genome assembly (like Trinity) >> >> You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option to reduce transcript merging in Trinity. >> >> ?Carson >> >> >>> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang > wrote: >>> >>> Hello: >>> >>> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. >>> (1) De novo assembly without mapping to any genome assembly (like Trinity)? >>> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate? >>> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)? >>> >>> Thanks >>> >>> Best >>> Quanwei >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Jan 28 13:53:45 2017 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 28 Jan 2017 13:53:45 -0700 Subject: [maker-devel] Maker-Error when started with OpenMPI In-Reply-To: References: Message-ID: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> Try adding one of the following to your mpiexec command ?> 1. --mca btl ^openib 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0 One or the other may fix your issue. The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter. --Carson > On Jan 27, 2017, at 3:31 AM, Rainer Rutka wrote: > > Hi everybody. > > My name is Rainer. I am an administrator for our HPC-Systems at our > university in Konstanz, Baden-Wuertemberg/Germany. > The procect is called bwHPC-C5. > > See: https://www.bwhpc-c5.de/en/index.php > > I try to get Maker running on our bwUniCluster since weeks. Unfortunately > i get errors while running a Maker job in the MPI-environment. > > BUILD STATUS > > ============================================================================== > STATUS MAKER v2.31.9 > ============================================================================== > PERL Dependencies: VERIFIED > External Programs: VERIFIED > External C Libraries: VERIFIED > MPI SUPPORT: ENABLED > MWAS Web Interface: DISABLED > MAKER PACKAGE: CONFIGURATION OK > > MODULES / INCLUDES / COMPILERS > > # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9 > # > ##### (B) Dependencies: > # > # conflict: any other maker version > # module load compiler/gnu/5.2 > # module load mpi/openmpi/2.0-gnu-5.2 > [...] > > MPI/MOAB SUBMIT > > [...] > ### Queues ### > #MSUB -q fat > #MSUB -l nodes=1:ppn=16 > #MSUB -l mem=20gb > #MSUB -l walltime=50:00:00 > # > [...] > echo " " > echo "### Loading MAKER module:" > echo " " > module load bio/maker/2.31.9 > [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; } > echo "MAKER_VERSION = $MAKER_VERSION" > module list > [...] > echo " " > echo "### Runing Maker example" > echo " " > export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so > export OMPI_MCA_mpi_warn_on_fork=0 > > echo "LD_PRELOAD=${LD_PRELOAD}" > # > # "STATUS: Processing and indexing input FASTA files..." > # > mpiexec -mca btl ^openib -n 16 maker > [...] > > > E R R O R S > ======= > [...] > LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > [uc1n338:113607] *** Process received signal *** > [uc1n338:113607] Signal: Segmentation fault (11) > [uc1n338:113607] Signal code: Address not mapped (1) > [uc1n338:113607] Failing at address: 0x4b0 > [uc1n338:113608] *** Process received signal *** > [uc1n338:113608] Signal: Segmentation fault (11) > [uc1n338:113608] Signal code: Address not mapped (1) > [uc1n338:113608] Failing at address: 0x4b0 > [uc1n338:113621] *** Process received signal *** > [uc1n338:113621] Signal: Segmentation fault (11) > [uc1n338:113621] Signal code: Address not mapped (1) > [uc1n338:113621] Failing at address: 0x4b0 > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > [...] > > WHATS WRONG HERE!? > > Thank you for your help! > > All the best , > > Rainer > > -- > Rainer Rutka > University of Konstanz > Communication, Information, Media Centre (KIM) > * High-Performance-Computing (HPC) > * KIM-Support and -Base-Services > Room: V511 > 78457 Konstanz, Germany > +49 7531 88-5413 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From rainer.rutka at uni-konstanz.de Mon Jan 30 01:32:08 2017 From: rainer.rutka at uni-konstanz.de (Rainer Rutka) Date: Mon, 30 Jan 2017 09:32:08 +0100 Subject: [maker-devel] Maker-Error when started with OpenMPI In-Reply-To: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> References: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com> Message-ID: Hi Carson! Thank you VERY MUCH for your hints. Much appreciated! I'll test these today and let you know about the results. Again: THANKS! :-) BTW: I'm not a scientist. Only a system operator. :-) Am 28.01.2017 um 21:53 schrieb Carson Holt: > Try adding one of the following to your mpiexec command ?> > 1. --mca btl ^openib > 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 > 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0 > One or the other may fix your issue. The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter. > --Carson -- Rainer Rutka University of Konstanz Communication, Information, Media Centre (KIM) * High-Performance-Computing (HPC) * KIM-Support and -Base-Services Room: V511 78457 Konstanz, Germany +49 7531 88-5413 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5055 bytes Desc: S/MIME Cryptographic Signature URL: From qwzhang0601 at gmail.com Tue Jan 31 10:36:13 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 12:36:13 -0500 Subject: [maker-devel] collecting protein sequences as evidences Message-ID: I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? Many thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Jan 31 12:08:21 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 14:08:21 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals Message-ID: Hello: I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Jan 31 12:26:29 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 31 Jan 2017 14:26:29 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: Message-ID: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Tue Jan 31 13:57:28 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 31 Jan 2017 15:57:28 -0500 Subject: [maker-devel] collecting protein sequences as evidences In-Reply-To: References: Message-ID: <2E4D90C9-6D6E-4F52-A361-AFB06A61D2C2@gmail.com> Hi Quanwei, (1) When I use uniprot I use SWISS-prot and not tremble. (2) I don?t merge files together. I just pass them all to MAKER as a comma separated list. Thanks, Mike > On Jan 31, 2017, at 12:36 PM, Quanwei Zhang wrote: > > I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. > (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). > (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? > > Many thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Jan 31 14:05:43 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 21:05:43 +0000 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. chris On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" wrote: I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mjfi2sb3 at gmail.com Tue Jan 31 14:14:14 2017 From: mjfi2sb3 at gmail.com (Salim Bougouffa) Date: Tue, 31 Jan 2017 21:14:14 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> References: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> Message-ID: Hi Christopher, How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one? Best /SB On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, wrote: > If I recall, from a BAM you would need to run a reference-based assembly > on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use > Trinity for ref-based assembly. But I always choose the route of a full de > novo assembly (again, Trinity or similar) when possible, doing some basic > cleanup (e.g. remove low confidence transcripts) and bring them as EST > evidence. > > chris > > From: maker-devel on behalf of > Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM > To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" , " > help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format > > Hi Maya, > > I'm not sure what MAKER's requirements are in this regard--I'm forwarding > this to their mailing list. > > Scott > > > On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: > > Hi, > > I have RNA-seq data, and genomic data that I want to annotate using maker. > > From what I understood, I need to genarate a gff3 file format from the > RNA-seq mapping sequences. I had mapped the RNA sequences to the genome > using bowtie and tophat. However, I still do not know how to take these > format and convert them to a gff3 file that I can them use in maker as > annotation evidence > > I saw the wiki page, that did not mention how to make this conversion ( > http://gmod.org/wiki/GFF3 > > ) > > Can you please help me? > > Sincerely, > Maya > > ---- > Maya Britstein > Ph.D candidate > Laura Steindler's Lab > Marine Biology Department > Leon H. Charney School of Marine Sciences > University of Haifa, Israel > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/ > ) > 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Jan 31 14:33:12 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 31 Jan 2017 16:33:12 -0500 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. Best Quanwei 2017-01-31 16:05 GMT-05:00 Fields, Christopher J : > I agree with Mike. I also suggest not combining RNA-Seqs from different > runs (e.g. different studies) even if they are from the same tissue, > development stage etc. There are many other factors (biological variation, > sample quality, sequencing chemistry or technology differences, etc) that > can significantly and negatively impact trx assembly quality. > > chris > > On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" < > maker-devel-bounces at yandell-lab.org on behalf of > michael.s.campbell1 at gmail.com> wrote: > > I would probably try merging the replicates but not the tissues. You > can then pass the output files to MAKER in a comma separated list in the > opts file. > > Example: > est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta > > Good luck, > Mike > > > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > I am trying to assemble transcripts using RNA-seq data by the tool > Trinity, which will be used for gene annotation for Maker. Now I have data > from two tissues with two replicates each. Should I merge all four samples > to get one assembly file? Or should I merge replicates of each tissue > separately and use the two assembly files as input of Maker. Merging all > samples into one, we will have much higher coverage level, but I think > there may be some genes expressed by tissue-specific isoforms. So I not > sure whether I should merge RNA-seq from different tissues. > > What's more, I find some published RNA-seq data from another > individual (and also for different tissue from us) for the same species. > Should I merge all RNA-seq together (across individuals and tissues)? Or > should I generate different transcript assembly and use all those > assemblies as input to Maker? > > > > Thanks > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_ > yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_ > yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 31 14:35:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 31 Jan 2017 14:35:20 -0700 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> Message-ID: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list. ?Carson > On Jan 31, 2017, at 2:33 PM, Quanwei Zhang wrote: > > Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. > > Best > Quanwei > > 2017-01-31 16:05 GMT-05:00 Fields, Christopher J >: > I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. > > chris > > On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" on behalf of michael.s.campbell1 at gmail.com > wrote: > > I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. > > Example: > est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta > > Good luck, > Mike > > > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > > > Hello: > > > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > > > Thanks > > Best > > Quanwei > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Jan 31 16:05:43 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 23:05:43 +0000 Subject: [maker-devel] GFF3 file format In-Reply-To: References: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu> Message-ID: <8BD384C9-4E46-42AC-A59F-96299EF5E104@illinois.edu> You can use RSEM for some initial filtering: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts Then I generally use the Trinity QA steps, in particular TransRate or DETONATE: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment chris From: Salim Bougouffa Date: Tuesday, January 31, 2017 at 3:14 PM To: Chris Fields , Scott Cain , Maya Britstein Cc: "maker-devel at yandell-lab.org List" , "help at gmod.org" Subject: Re: [maker-devel] GFF3 file format Hi Christopher, How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one? Best /SB On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, > wrote: If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly. But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence. chris From: maker-devel > on behalf of Scott Cain > Date: Wednesday, January 25, 2017 at 2:23 PM To: Maya Britstein > Cc: "maker-devel at yandell-lab.org List" >, "help at gmod.org" > Subject: Re: [maker-devel] GFF3 file format Hi Maya, I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list. Scott On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein > wrote: Hi, I have RNA-seq data, and genomic data that I want to annotate using maker. From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3) Can you please help me? Sincerely, Maya ---- Maya Britstein Ph.D candidate Laura Steindler's Lab Marine Biology Department Leon H. Charney School of Marine Sciences University of Haifa, Israel -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- ____________________________ Sent from Inbox Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Jan 31 16:07:44 2017 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 31 Jan 2017 23:07:44 +0000 Subject: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals In-Reply-To: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> References: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com> <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com> Message-ID: Exactly chris From: Carson Holt Date: Tuesday, January 31, 2017 at 3:35 PM To: Quanwei Zhang Cc: Chris Fields , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list. ?Carson On Jan 31, 2017, at 2:33 PM, Quanwei Zhang > wrote: Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it. Best Quanwei 2017-01-31 16:05 GMT-05:00 Fields, Christopher J >: I agree with Mike. I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality. chris On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" on behalf of michael.s.campbell1 at gmail.com> wrote: I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file. Example: est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta Good luck, Mike > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang > wrote: > > Hello: > > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues. > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker? > > Thanks > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: