From rob.syme at gmail.com  Thu Jan  5 00:41:25 2017
From: rob.syme at gmail.com (Rob Syme)
Date: Thu, 05 Jan 2017 06:41:25 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
Message-ID: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>

Hi all

The MAKER wiki page "Repeat Library Construction - Advanced
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>"
describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded
MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there.
Are they distributed with MAKER or separately. Does anybody know where to
find them?

Thanks!

Rob Syme
Research Associate
Curtin University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170105/299fabc3/attachment.html>

From olegl at volcani.agri.gov.il  Thu Jan  5 04:07:31 2017
From: olegl at volcani.agri.gov.il (Oleg Lovky)
Date: Thu, 5 Jan 2017 10:07:31 +0000
Subject: [maker-devel] Unable to train SNAP
Message-ID: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>

Hello,

I'm running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000).
However, I'm not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file.
Furthermore, when trying to use maker2zff I'm getting empty genome.ann and genome.dna files.

Please advise.

Regards,

Oleg Lovky, MSc.
Research Engineer
Institute of Plant Sciences
ARO, Volcani Center
Cell: 054-4870319
[v95_15]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170105/90c174a8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 16191 bytes
Desc: image001.png
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170105/90c174a8/attachment.png>

From michael.s.campbell1 at gmail.com  Thu Jan  5 08:54:17 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Thu, 5 Jan 2017 09:54:17 -0500
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <3B3F80CA-BFA1-4F0E-A2F1-CA60E8496D5F@gmail.com>

Hi Rob,

There is a link near the bottom of that wiki page at the end of this line

"CRL and other custom scripts are available here.?

That points to this URL http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz <http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz>

Thanks,
Mike
> On Jan 5, 2017, at 1:41 AM, Rob Syme <rob.syme at gmail.com> wrote:
> 
> Hi all
> 
> The MAKER wiki page "Repeat Library Construction - Advanced <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them?
> 
> Thanks!
> 
> Rob Syme
> Research Associate
> Curtin University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170105/220eecad/attachment.html>

From rob.syme at gmail.com  Thu Jan  5 19:29:35 2017
From: rob.syme at gmail.com (Rob Syme)
Date: Fri, 06 Jan 2017 01:29:35 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <CAEf4xgcZXf18ZWD9JusvrsyUdLCg_wOe2SuA2d91mnTcug+u1w@mail.gmail.com>

Oh dear. That's embarrassing for me! Sorry for the silly question.

-r

On Thu, 5 Jan 2017 at 14:41 Rob Syme <rob.syme at gmail.com> wrote:

> Hi all
>
> The MAKER wiki page "Repeat Library Construction - Advanced
> <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>"
> describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded
> MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there.
> Are they distributed with MAKER or separately. Does anybody know where to
> find them?
>
> Thanks!
>
> Rob Syme
> Research Associate
> Curtin University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170106/ad9453c6/attachment.html>

From xvazquezc at gmail.com  Thu Jan  5 20:23:17 2017
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Fri, 6 Jan 2017 13:23:17 +1100
Subject: [maker-devel] Unable to train SNAP
In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
Message-ID: <CAL0hg4EEQk5CWkrni6-o29m_mAOkYjLKqjA8Df04FKJMbfDB8g@mail.gmail.com>

Are you using the -n option with maker2zff? You often get empty genome.ann
and genome.dna files if you don't.

On 5 January 2017 at 21:07, Oleg Lovky <olegl at volcani.agri.gov.il> wrote:

> Hello,
>
>
>
> I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1)
> containing ~50k reads (length ranges from 70 to 12000).
>
> However, I?m not getting transcript and proteins fasta files at all,
> despite Maker not giving any errors and everything is listed as finished in
> the datastore log file.
>
> Furthermore, when trying to use maker2zff I?m getting empty genome.ann and
> genome.dna files.
>
>
>
> Please advise.
>
>
>
> Regards,
>
>
>
> Oleg Lovky, MSc.
>
> Research Engineer
>
> Institute of Plant Sciences
>
> ARO, Volcani Center
>
> Cell: 054-4870319
>
> [image: v95_15]
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170106/7dceb9af/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 16191 bytes
Desc: not available
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170106/7dceb9af/attachment.png>

From carsonhh at gmail.com  Fri Jan  6 13:28:02 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 6 Jan 2017 12:28:02 -0700
Subject: [maker-devel] Unable to train SNAP
In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
Message-ID: <8F65E561-7450-4B5A-8F1B-4E51C0D25BE2@gmail.com>

The maker2zff script has a number of thresholds that must be reached to avoid filtering all models. If you don?t have protein evidence in the dataset for example, then that filter may always be failing. You may just want to turn all filters off with the -n option as previously suggested.

?Carson


> On Jan 5, 2017, at 3:07 AM, Oleg Lovky <olegl at volcani.agri.gov.il> wrote:
> 
> Hello,
>  
> I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000).
> However, I?m not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file.
> Furthermore, when trying to use maker2zff I?m getting empty genome.ann and genome.dna files.
>  
> Please advise.
>  
> Regards,
>  
> Oleg Lovky, MSc.
> Research Engineer
> Institute of Plant Sciences
> ARO, Volcani Center
> Cell: 054-4870319
> <image001.png>
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170106/823f4e75/attachment.html>

From kchilds at msu.edu  Thu Jan  5 08:28:00 2017
From: kchilds at msu.edu (Childs, Kevin)
Date: Thu, 5 Jan 2017 14:28:00 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <6AE4044B-9011-4421-A6F1-FE3B95BBB11D@msu.edu>

Rob,

The scripts can be found in a link at the bottom of this wiki page:

http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced

Kevin Childs

---
Kevin Childs, PhD

Assistant Professor - Fixed Term
Center for Genomics-Enabled Plant Science
Plant Biology Department
Michigan State University

kchilds at msu.edu
517-775-2844 (m)
517-884-6926 (o)

http://childslab.plantbiology.msu.edu


> On Jan 5, 2017, at 1:41 AM, Rob Syme <rob.syme at gmail.com> wrote:
> 
> Hi all
> 
> The MAKER wiki page "Repeat Library Construction - Advanced" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them?
> 
> Thanks!
> 
> Rob Syme
> Research Associate
> Curtin University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Fri Jan  6 19:22:10 2017
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Fri, 6 Jan 2017 20:22:10 -0500
Subject: [maker-devel] /tmp full
Message-ID: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>

Hi all,

Maker keeps filling up the /tmp directories on the cluster I am using. It
appears that most of the space is taken with many versions of various blast
databases. I suspect that this issue is partly due to my not using MPI and
instead launching multiple instances of maker (typically 16) in the same
working directory. However, it appears that maker is also leaving some of
these databases in /tmp even after it has died or been killed and they are
piling up.

I am submitting my jobs to the cluster via SLURM but have installed maker
locally rather than system-wide. My system administrator is going to try
creating a larger locally mounted directory on some of the nodes for me but
I wanted to check to see if you have any other suggestions to solve the
issue or make sure that maker cleans up /tmp as aggressively as possible.

I am using maker3-beta.

Thanks for any help,
Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170106/3fb552ff/attachment.html>

From carsonhh at gmail.com  Sat Jan  7 17:29:29 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Sat, 7 Jan 2017 16:29:29 -0700
Subject: [maker-devel] /tmp full
In-Reply-To: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
References: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
Message-ID: <DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>

If you use the MPI settings, then all processes will share a single temporary directory, otherwise they each will have a separate one since they can?t intercommunicate.

MAKER tries to cleanup its files on finish or failure, but if you or the system kill it with certain signals, then it is reaped immediately by the system and not allowed to finish cleaning up. Signals 9 and 19 for example will do that. If a failure is related to the drive being full or a memory issue, then your system may be hitting it with one of these uncatchable signals. For example SLURM may use signal 9 or 19 if a process fails to respond to signal 15 in a timely manner (i.e. MAKER may be removing files, but SLURM gets impatient and kills it more aggressively because it thinks the process is not responding). You can always try and empty /tmp as the first step in your batch script, and it will remove files belonging to you before launching MAKER.

?Carson


> On Jan 6, 2017, at 6:22 PM, Benjamin Rubin <brubin at fieldmuseum.org> wrote:
> 
> Hi all,
> 
> Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. 
> 
> I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible.
> 
> I am using maker3-beta.
> 
> Thanks for any help,
> Ben
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Sun Jan  8 10:24:36 2017
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Sun, 8 Jan 2017 11:24:36 -0500
Subject: [maker-devel] /tmp full
In-Reply-To: <DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>
References: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
	<DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>
Message-ID: <CAKpVPBLfiYakZ3Ce2q02gYXatJHJzJ8dW-YMgscg9Nm6-KT03w@mail.gmail.com>

OK, thanks for the tips. Knowing the particulars of how SLURM might be
causing this is extremely helpful. I'll try to just empty /tmp before
running MAKER on each node, as you suggest. I suspect that will work but
will work on getting MPI running as well.

Thanks!
Ben

On Sat, Jan 7, 2017 at 6:29 PM, Carson Holt <carsonhh at gmail.com> wrote:

> If you use the MPI settings, then all processes will share a single
> temporary directory, otherwise they each will have a separate one since
> they can?t intercommunicate.
>
> MAKER tries to cleanup its files on finish or failure, but if you or the
> system kill it with certain signals, then it is reaped immediately by the
> system and not allowed to finish cleaning up. Signals 9 and 19 for example
> will do that. If a failure is related to the drive being full or a memory
> issue, then your system may be hitting it with one of these uncatchable
> signals. For example SLURM may use signal 9 or 19 if a process fails to
> respond to signal 15 in a timely manner (i.e. MAKER may be removing files,
> but SLURM gets impatient and kills it more aggressively because it thinks
> the process is not responding). You can always try and empty /tmp as the
> first step in your batch script, and it will remove files belonging to you
> before launching MAKER.
>
> ?Carson
>
>
>
>
> > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin <brubin at fieldmuseum.org>
> wrote:
> >
> > Hi all,
> >
> > Maker keeps filling up the /tmp directories on the cluster I am using.
> It appears that most of the space is taken with many versions of various
> blast databases. I suspect that this issue is partly due to my not using
> MPI and instead launching multiple instances of maker (typically 16) in the
> same working directory. However, it appears that maker is also leaving some
> of these databases in /tmp even after it has died or been killed and they
> are piling up.
> >
> > I am submitting my jobs to the cluster via SLURM but have installed
> maker locally rather than system-wide. My system administrator is going to
> try creating a larger locally mounted directory on some of the nodes for me
> but I wanted to check to see if you have any other suggestions to solve the
> issue or make sure that maker cleans up /tmp as aggressively as possible.
> >
> > I am using maker3-beta.
> >
> > Thanks for any help,
> > Ben
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
_____________________________________________________
Benjamin ER Rubin, PhD
Committee on Evolutionary Biology
University of Chicago
benrubin.org

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170108/e4efa4cc/attachment.html>

From lmainzer at life.illinois.edu  Mon Jan  9 01:02:01 2017
From: lmainzer at life.illinois.edu (Liudmila Sergeevna Mainzer)
Date: Mon, 9 Jan 2017 01:02:01 -0600
Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names
Message-ID: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>

Hello, MAKER developers!

I tried submitting this bug report through the web form on the 
RepeatMasker web page, but I am getting an "invalid submission" message, 
so I decided to post here.

I found a weird bug that results in the notorious "index out of bounds" 
error reported by RepeatMasker. Significantly, this error only arises on 
very long file names generated by MAKER.

I traced this through the code, and identified the error to originate in 
Tandem Repeat finder. TRF sometimes splits up its output into separate 
files. When that happens, the pieces with index >1 do not contain the 
sequence name. Compare the first few lines between these two files:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html
 
<HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 


     bgcolor="#File 1 of 2 FBF8BC"><PRE>
     Tandem Repeats Finder Program written by:
                   Gary Benson
                   Program in Bioinformatics
                   Boston University
     Version 4.09
     Sequence: InputSequencefrag-1 CHUNK number:191 <http://number:191>
     size:455659 <http://size:455659> offset:57300000
     <http://offset:57300000>
     Parameters: 2 3 5 75 20 33 7

etcetera
But also the second chunk:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
 
<HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 


     bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
     <A NAME="56278--56322,1,45.0,1,1136"></A><A
     HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
     <http://tandem.bu.edu/trf/trf.definitions.html#alignment> target
     ="explanation">Alignment explanation</A><BR><BR>
        Indices: 56278--56322  Score: 55
        Period size: 1  Copynumber: 45.0  Consensus size: 1

etcetera


See how one file has the full header with the "Sequence:" statement and 
the other one does not? This "Sequence:" statement is used in the 
RepeatMasker code to name each piece of sequence that ends up being 
masked later. When this variable if empty (the name string is not 
defined), the setSubstr subroutine in the main RepeatMasker code breaks: 
length of an undefined string is of course zero, and that subroutine has 
a check for sequences whose length is shorter than the region that needs 
to be masked.

So it quits with the statement "Error index out of bounds!", even though 
the sequence is finite length, does not have any weird characters, and 
is maskable.

Once again, this only arises on very long file names, and those seem to 
be created by MAKER. Example:
LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific

Notice how the last part of the file name has a bunch of identifiers 
separated by the %2E (generic URI-encoding)? I experimented with that 
file name. The path does not matter. The % signs do not matter. It is 
the length of the filename itself: if it is <108 characters, then 
RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like 
maybe Perl is not handling that long a name very well...

So the problem is three-fold: MAKER creates file names that are 
very-very long, while RepeatMasker breaks due to TRF failing to write 
the file headers properly for those very long file names.

Would you provide any suggestions or patches for this problem? It is 
forcing us to run RepeatMasker separately, outside the main MAKER 
worlflow, which really complicates the data management and analysis as a 
whole.
We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl 
v5.10.1 built for x86_64-linux-thread-multi.

Many thanks in advance,
Liudmila Mainzer

----------------
Senior Research Scientist
National Center for Supercomputing Applications

Research Assistant Professor
Institute of Genomic Biology

University of Illinois
217-300-0568
1205 W. Clark St. Room 4026
Urbana, IL 61801


From carsonhh at gmail.com  Mon Jan  9 10:30:09 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 9 Jan 2017 09:30:09 -0700
Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names
In-Reply-To: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>
References: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>
Message-ID: <733D5263-6CFC-4AB3-BFDD-30330B0E1985@gmail.com>

The name used by maker is based off of the input file name, so quick fix would just be to rename your input file to have a shorter name.

?Carson


> On Jan 9, 2017, at 12:02 AM, Liudmila Sergeevna Mainzer <lmainzer at life.illinois.edu> wrote:
> 
> Hello, MAKER developers!
> 
> I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here.
> 
> I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER.
> 
> I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html
> <HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 
> 
>    bgcolor="#File 1 of 2 FBF8BC"><PRE>
>    Tandem Repeats Finder Program written by:
>                  Gary Benson
>                  Program in Bioinformatics
>                  Boston University
>    Version 4.09
>    Sequence: InputSequencefrag-1 CHUNK number:191 <http://number:191>
>    size:455659 <http://size:455659> offset:57300000
>    <http://offset:57300000>
>    Parameters: 2 3 5 75 20 33 7
> 
> etcetera
> But also the second chunk:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
> <HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 
> 
>    bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
>    <A NAME="56278--56322,1,45.0,1,1136"></A><A
>    HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
>    <http://tandem.bu.edu/trf/trf.definitions.html#alignment> target
>    ="explanation">Alignment explanation</A><BR><BR>
>       Indices: 56278--56322  Score: 55
>       Period size: 1  Copynumber: 45.0  Consensus size: 1
> 
> etcetera
> 
> 
> See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked.
> 
> So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable.
> 
> Once again, this only arises on very long file names, and those seem to be created by MAKER. Example:
> LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific
> 
> Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well...
> 
> So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names.
> 
> Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole.
> We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi.
> 
> Many thanks in advance,
> Liudmila Mainzer
> 
> ----------------
> Senior Research Scientist
> National Center for Supercomputing Applications
> 
> Research Assistant Professor
> Institute of Genomic Biology
> 
> University of Illinois
> 217-300-0568
> 1205 W. Clark St. Room 4026
> Urbana, IL 61801
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From qlian003 at ucr.edu  Wed Jan 11 23:28:32 2017
From: qlian003 at ucr.edu (Qihua Liang)
Date: Wed, 11 Jan 2017 21:28:32 -0800
Subject: [maker-devel] gff file: possible sources
Message-ID: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>

Hi Maker develop team!

I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as,

Possible Sources Include:
BLASTN - BLASTN alignment of EST evidence
BLASTX - BLASTX alignment of protein evidence
TBLASTX - TBLASTX alignment of EST evidence from closely related organisms
EST2Genome - Polished EST alignment from Exonerate
Protein2Genome - Polished protein alignment from Exonerate
SNAP - SNAP ab inito gene prediction
GENEMARK - GeneMarkab inito gene prediction
Augustus - Augustus ab inito gene prediction
FgenesH - FGENESH ab inito gene prediction
Repeatmasker - RepeatMasker identified repeat
RepeatRunner - RepeatRunner identified repeat from the repeat protein database
tRNAScan - tRNAScan-SE tRNA predictions (coming soon)
PASA - PASA gene predictions (coming soon)

There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above?

Thanks
Qihua

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170111/bc9a45df/attachment.html>

From dence at genetics.utah.edu  Thu Jan 12 07:28:24 2017
From: dence at genetics.utah.edu (Daniel Ence)
Date: Thu, 12 Jan 2017 13:28:24 +0000
Subject: [maker-devel] gff file: possible sources
In-Reply-To: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>
References: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>
Message-ID: <DE48F3CB-8B72-43A6-8331-ED1B811CDCCE@genetics.utah.edu>

Hi Qihua, the cdna2genome is the polished tblastx alignments from Exonerate. Basically, the source column should be the name of the tool that generated the alignment, prediction, or gene model.

~Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

On Jan 11, 2017, at 11:28 PM, Qihua Liang <qlian003 at ucr.edu<mailto:qlian003 at ucr.edu>> wrote:

Hi Maker develop team!

I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as,

Possible Sources Include:

  *   BLASTN - BLASTN alignment of EST evidence
  *   BLASTX - BLASTX alignment of protein evidence
  *   TBLASTX - TBLASTX alignment of EST evidence from closely related organisms
  *   EST2Genome - Polished EST alignment from Exonerate
  *   Protein2Genome - Polished protein alignment from Exonerate
  *   SNAP - SNAP ab inito gene prediction
  *   GENEMARK - GeneMarkab inito gene prediction
  *   Augustus - Augustus ab inito gene prediction
  *   FgenesH - FGENESH ab inito gene prediction
  *   Repeatmasker - RepeatMasker identified repeat
  *   RepeatRunner - RepeatRunner identified repeat from the repeat protein database
  *   tRNAScan - tRNAScan-SE tRNA predictions (coming soon)
  *   PASA - PASA gene predictions (coming soon)

There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above?

Thanks
Qihua

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170112/86ed58bb/attachment.html>

From patel.kumar.vipul at gmail.com  Fri Jan 20 02:44:26 2017
From: patel.kumar.vipul at gmail.com (Vipul Patel)
Date: Fri, 20 Jan 2017 09:44:26 +0100
Subject: [maker-devel] Maker crash for long chrm.
Message-ID: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>

Hi,

I hope someone can help me to figure out what is actually going wrong.

I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP
variable not to use NFS. The given testcase as well for 1k<small contigs <
1MB runs without any problems.

Applying it to a sequence, for example with 57MB it failes, I tried it as
well with a different sequences around 60MB, same outcome.

I looked into the logs, but it was not really helpful as it was just stated
that the job failed

It crashed with following message:

deleted:0 genes
substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Calling translate without a seq argument!
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447
STACK: Bio::Tools::CodonTable::translate
/usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419
STACK: CGL::TranslationMachine::longest_translation_plus_stop
programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
STACK: maker::auto_annotator::get_translation_seq
programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236
STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/
snap.pm:974
STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/
snap.pm:690
STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
STACK: Process::MpiChunk::_go
programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469
STACK: Process::MpiChunk::run
programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341
STACK: programs/maker/maker/bin/maker:979
-----------------------------------------------------------
--> rank=16, hostname=dummy
ERROR: Failed while gathering ab-init output files
ERROR: Chunk failed at level:1, tier_type:2
FAILED CONTIG:chr_test

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:chr_test

examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...

I got the same message if I run it without MPI, So I can guess it is not an
MPI issue.
How can I find out if some jobs died so maybe this could lead to this
problem?
Other ideas how I can tackle this problem?

Kind regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170120/d0c6f874/attachment.html>

From patel.kumar.vipul at gmail.com  Fri Jan 20 07:34:28 2017
From: patel.kumar.vipul at gmail.com (Vipul Patel)
Date: Fri, 20 Jan 2017 14:34:28 +0100
Subject: [maker-devel] Maker crash for long chrm.
In-Reply-To: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
References: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
Message-ID: <CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>

Solved. After some digging and printing I found out the problem.

It was snap itself!

For anybody who maybe runs in the  same problem, check snap. Apparently it
was not correctly compiled and therefore it produced a not conform output!
Recompiling solved my issue.

Kind regards

2017-01-20 9:44 GMT+01:00 Vipul Patel <patel.kumar.vipul at gmail.com>:

> Hi,
>
> I hope someone can help me to figure out what is actually going wrong.
>
> I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP
> variable not to use NFS. The given testcase as well for 1k<small contigs <
> 1MB runs without any problems.
>
> Applying it to a sequence, for example with 57MB it failes, I tried it as
> well with a different sequences around 60MB, same outcome.
>
> I looked into the logs, but it was not really helpful as it was just
> stated that the job failed
>
> It crashed with following message:
>
> deleted:0 genes
> substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Calling translate without a seq argument!
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/
> Bio/Root/Root.pm:447
> STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/
> Bio/Tools/CodonTable.pm:419
> STACK: CGL::TranslationMachine::longest_translation_plus_stop
> programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
> STACK: maker::auto_annotator::get_translation_seq
> programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236
> STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../
> lib/Widget/snap.pm:974
> STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/
> snap.pm:690
> STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
> STACK: Process::MpiChunk::_go programs/maker/maker/bin/../
> lib/Process/MpiChunk.pm:1469
> STACK: Process::MpiChunk::run programs/maker/maker/bin/../
> lib/Process/MpiChunk.pm:341
> STACK: programs/maker/maker/bin/maker:979
> -----------------------------------------------------------
> --> rank=16, hostname=dummy
> ERROR: Failed while gathering ab-init output files
> ERROR: Chunk failed at level:1, tier_type:2
> FAILED CONTIG:chr_test
>
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:chr_test
>
> examining contents of the fasta file and run log
>
>
>
> --Next Contig--
>
> Processing run.log file...
>
> I got the same message if I run it without MPI, So I can guess it is not
> an MPI issue.
> How can I find out if some jobs died so maybe this could lead to this
> problem?
> Other ideas how I can tackle this problem?
>
> Kind regards
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170120/9e92c0fc/attachment.html>

From carsonhh at gmail.com  Fri Jan 20 16:00:49 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 20 Jan 2017 15:00:49 -0700
Subject: [maker-devel] Maker crash for long chrm.
In-Reply-To: <CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>
References: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
	<CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>
Message-ID: <59841676-741F-496D-9E47-7750417033A4@gmail.com>

I?m glad it?s working for you. Let us know if anything else comes up.

?Carson

> On Jan 20, 2017, at 6:34 AM, Vipul Patel <patel.kumar.vipul at gmail.com> wrote:
> 
> Solved. After some digging and printing I found out the problem.
> 
> It was snap itself!
> 
> For anybody who maybe runs in the  same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. 
> 
> Kind regards
> 
> 2017-01-20 9:44 GMT+01:00 Vipul Patel <patel.kumar.vipul at gmail.com <mailto:patel.kumar.vipul at gmail.com>>:
> Hi,
> 
> I hope someone can help me to figure out what is actually going wrong. 
> 
> I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k<small contigs < 1MB runs without any problems. 
> 
> Applying it to a sequence, for example with 57MB it failes, I tried it as well with a different sequences around 60MB, same outcome. 
> 
> I looked into the logs, but it was not really helpful as it was just stated that the job failed
> 
> It crashed with following message:
> 
> deleted:0 genes
> substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Calling translate without a seq argument!
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447
> STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419
> STACK: CGL::TranslationMachine::longest_translation_plus_stop programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
> STACK: maker::auto_annotator::get_translation_seq programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 <http://auto_annotator.pm:3236/>
> STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/snap.pm:974 <http://snap.pm:974/>
> STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/snap.pm:690 <http://snap.pm:690/>
> STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
> STACK: Process::MpiChunk::_go programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469
> STACK: Process::MpiChunk::run programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341
> STACK: programs/maker/maker/bin/maker:979
> -----------------------------------------------------------
> --> rank=16, hostname=dummy
> ERROR: Failed while gathering ab-init output files
> ERROR: Chunk failed at level:1, tier_type:2
> FAILED CONTIG:chr_test
> 
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:chr_test
> 
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> 
> I got the same message if I run it without MPI, So I can guess it is not an MPI issue. 
> How can I find out if some jobs died so maybe this could lead to this problem?
> Other ideas how I can tackle this problem?
> 
> Kind regards
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170120/c26f37b6/attachment.html>

From mayabritstein at gmail.com  Mon Jan 23 02:30:40 2017
From: mayabritstein at gmail.com (Maya Britstein)
Date: Mon, 23 Jan 2017 10:30:40 +0200
Subject: [maker-devel] Authorization failed.
Message-ID: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>

Hi,

I can't access the maker-devel archives. I am entering my email, and what I
think is my password, but still it doesn't work.

thanks,

Maya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170123/1d817b70/attachment.html>

From bmoore at genetics.utah.edu  Mon Jan 23 06:43:53 2017
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Mon, 23 Jan 2017 12:43:53 +0000
Subject: [maker-devel] Authorization failed.
In-Reply-To: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>
References: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>
Message-ID: <E0148C3A-ACD6-49B2-A39C-C8393D0E9CEA@genetics.utah.edu>

Hi Maya,

If you follow the link below you will find at the bottom of the page a portion of the form that allows you to reset your password.  It?s a little misleading because it looks like it?s only an ?Unsubscribe? option, but it also takes you to a page that allows you to update your subscription details including password reminder/reset.  The actual text for the portion of the page you?re looking for is this:

'To unsubscribe from maker-devel, get a password reminder, or change your subscription options enter your subscription email address:'

The linke is:

http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Regards,

Barry

On Jan 23, 2017, at 1:30 AM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:

Hi,

I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work.

thanks,

Maya
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170123/c4c9f1fb/attachment.html>

From daren.card at gmail.com  Tue Jan 24 08:06:22 2017
From: daren.card at gmail.com (Daren C. Card)
Date: Tue, 24 Jan 2017 08:06:22 -0600
Subject: [maker-devel] Maker error: Invalid nucleotide
Message-ID: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>

Hi everyone,

I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig:

?
Widget::augustus:
/opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus
#-------------------------------#

/opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
	Invalid nucleotide '8' encountered.


/opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
	Invalid nucleotide '8' encountered.

ERROR: Augustus failed
--> rank=7, hostname=moonunit0
ERROR: Failed while preparing ab-inits
ERROR: Chunk failed at level:0, tier_type:2
FAILED CONTIG:scaffold-92

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scaffold-92

examining contents of the fasta file and run log
?

Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues.

Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary.

Thanks,
Daren Card

UT-Arlington


From carsonhh at gmail.com  Wed Jan 25 11:37:50 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 25 Jan 2017 10:37:50 -0700
Subject: [maker-devel] Maker error: Invalid nucleotide
In-Reply-To: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>
References: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>
Message-ID: <5E13AB7E-9175-4440-AD62-A53BD9DD8DE1@gmail.com>

Try running the contig in question (scaffold-92) as a separate MAKER run. That may haelp indicate if the issue may be a corrupt intermediate file (if it is, you can set clean_try=1 to force deletion of intermediate files before rerun).

?Carson


> On Jan 24, 2017, at 7:06 AM, Daren C. Card <daren.card at gmail.com> wrote:
> 
> Hi everyone,
> 
> I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig:
> 
> ?
> Widget::augustus:
> /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus
> #-------------------------------#
> 
> /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
> 	Invalid nucleotide '8' encountered.
> 
> 
> /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
> 	Invalid nucleotide '8' encountered.
> 
> ERROR: Augustus failed
> --> rank=7, hostname=moonunit0
> ERROR: Failed while preparing ab-inits
> ERROR: Chunk failed at level:0, tier_type:2
> FAILED CONTIG:scaffold-92
> 
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:scaffold-92
> 
> examining contents of the fasta file and run log
> ?
> 
> Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues.
> 
> Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary.
> 
> Thanks,
> Daren Card
> 
> UT-Arlington
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From scott at scottcain.net  Wed Jan 25 14:23:02 2017
From: scott at scottcain.net (Scott Cain)
Date: Wed, 25 Jan 2017 15:23:02 -0500
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
Message-ID: <CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding
this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com>
wrote:

> Hi,
>
> I have RNA-seq data, and genomic data that I want to annotate using maker.
>
> From what I understood, I need to genarate a gff3 file format from the
> RNA-seq mapping sequences. I had mapped the RNA sequences to the genome
> using bowtie and tophat. However, I still do not know how to take these
> format and convert them to a gff3 file that I can them use in maker as
> annotation evidence
>
> I saw the wiki page, that did not mention how to make this conversion (
> http://gmod.org/wiki/GFF3)
>
> Can you please help me?
>
> Sincerely,
> Maya
>
> ----
> Maya Britstein
> Ph.D candidate
> Laura Steindler's Lab
> Marine Biology Department
> Leon H. Charney School of Marine Sciences
> University of Haifa, Israel
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170125/272d299a/attachment.html>

From cjfields at illinois.edu  Wed Jan 25 16:03:51 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 25 Jan 2017 22:03:51 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
Message-ID: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>

If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly.  But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence.

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Scott Cain <scott at scottcain.net<mailto:scott at scottcain.net>>
Date: Wednesday, January 25, 2017 at 2:23 PM
To: Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, "help at gmod.org<mailto:help at gmod.org>" <help at gmod.org<mailto:help at gmod.org>>
Subject: Re: [maker-devel] GFF3 file format

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:
Hi,

I have RNA-seq data, and genomic data that I want to annotate using maker.

From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence

I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>)

Can you please help me?

Sincerely,
Maya

----
Maya Britstein
Ph.D candidate
Laura Steindler's Lab
Marine Biology Department
Leon H. Charney School of Marine Sciences
University of Haifa, Israel


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)                     216-392-3087
Ontario Institute for Cancer Research


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170125/57e6cafc/attachment.html>

From qwzhang0601 at gmail.com  Thu Jan 26 14:26:42 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Thu, 26 Jan 2017 15:26:42 -0500
Subject: [maker-devel] canonical protein sequences or isoform?
Message-ID: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>

Hello:

I am doing annotation on a new genome and collecting proteins from mouse. I
found there are both canonical protein sequences and isoforms. I wonder
whether I should use only cannonical protein sequences or both the
canonical and isoforms?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170126/a8b37ec7/attachment.html>

From rainer.rutka at uni-konstanz.de  Fri Jan 27 04:31:40 2017
From: rainer.rutka at uni-konstanz.de (Rainer Rutka)
Date: Fri, 27 Jan 2017 11:31:40 +0100
Subject: [maker-devel] Maker-Error when started with OpenMPI
Message-ID: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>

Hi everybody.

My name is Rainer. I am an administrator for our HPC-Systems at our
university in Konstanz, Baden-Wuertemberg/Germany.
The procect is called bwHPC-C5.

See: https://www.bwhpc-c5.de/en/index.php

I try to get Maker running on our bwUniCluster since weeks. Unfortunately
i get errors while running a Maker job in the MPI-environment.

BUILD STATUS

==============================================================================
STATUS MAKER v2.31.9
==============================================================================
PERL Dependencies: VERIFIED
External Programs: VERIFIED
External C Libraries: VERIFIED
MPI SUPPORT: ENABLED
MWAS Web Interface: DISABLED
MAKER PACKAGE: CONFIGURATION OK

MODULES / INCLUDES / COMPILERS

# knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
#
##### (B) Dependencies:
#
# conflict: any other maker version
# module load compiler/gnu/5.2
# module load mpi/openmpi/2.0-gnu-5.2
[...]

MPI/MOAB SUBMIT

[...]
### Queues ###
#MSUB -q fat
#MSUB -l nodes=1:ppn=16
#MSUB -l mem=20gb
#MSUB -l walltime=50:00:00
#
[...]
echo " "
echo "### Loading MAKER module:"
echo " "
module load bio/maker/2.31.9
[ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 
'bio/maker/2.31.9'."; exit 1; }
echo "MAKER_VERSION = $MAKER_VERSION"
module list
[...]
echo " "
echo "### Runing Maker example"
echo " "
export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
export OMPI_MCA_mpi_warn_on_fork=0

echo "LD_PRELOAD=${LD_PRELOAD}"
#
# "STATUS: Processing and indexing input FASTA files..."
#
mpiexec -mca btl ^openib -n 16 maker
[...]


E R R O R S
=======
[...]
LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[uc1n338:113607] *** Process received signal ***
[uc1n338:113607] Signal: Segmentation fault (11)
[uc1n338:113607] Signal code: Address not mapped (1)
[uc1n338:113607] Failing at address: 0x4b0
[uc1n338:113608] *** Process received signal ***
[uc1n338:113608] Signal: Segmentation fault (11)
[uc1n338:113608] Signal code: Address not mapped (1)
[uc1n338:113608] Failing at address: 0x4b0
[uc1n338:113621] *** Process received signal ***
[uc1n338:113621] Signal: Segmentation fault (11)
[uc1n338:113621] Signal code: Address not mapped (1)
[uc1n338:113621] Failing at address: 0x4b0
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[...]

WHATS WRONG HERE!?

Thank you for your help!

All the best ,

Rainer

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170127/44fc3eb4/attachment.p7s>

From michael.s.campbell1 at gmail.com  Fri Jan 27 09:36:11 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Fri, 27 Jan 2017 10:36:11 -0500
Subject: [maker-devel] canonical protein sequences or isoform?
In-Reply-To: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>
References: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>
Message-ID: <C9A931ED-273F-4B67-B9C2-32C86166312C@gmail.com>

I give MAKER all isoforms as evidence.

Mike
> On Jan 26, 2017, at 3:26 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms?
> 
> Thanks
> 
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From qwzhang0601 at gmail.com  Fri Jan 27 10:13:22 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Fri, 27 Jan 2017 11:13:22 -0500
Subject: [maker-devel] transcript assembly of RNA-seq data
Message-ID: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>

Hello:

I wonder which is the best way to make use of RNA-seq data for gene
annotation of a new genome assembly.
(1) De novo assembly without mapping to any genome assembly (like Trinity)?
(2) TopHat+Cufflink do mapping to the new genome assembly, that want to
annotate?
(3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or
human)?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170127/b910c88d/attachment.html>

From carsonhh at gmail.com  Fri Jan 27 10:23:40 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 27 Jan 2017 09:23:40 -0700
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
Message-ID: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>

(1) De novo assembly without mapping to any genome assembly (like Trinity)

You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.

?Carson


> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello: 
> 
> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. 
> (1) De novo assembly without mapping to any genome assembly (like Trinity)?
> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?
> 
> Thanks
> 
> Best
> Quanwei
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170127/56300e39/attachment.html>

From cjfields at illinois.edu  Fri Jan 27 16:21:15 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jan 2017 22:21:15 +0000
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
	<4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
Message-ID: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>

Yup I agree.  Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used?

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Date: Friday, January 27, 2017 at 10:23 AM
To: Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] transcript assembly of RNA-seq data

(1) De novo assembly without mapping to any genome assembly (like Trinity)

You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.

?Carson


On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:

Hello:

I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly.
(1) De novo assembly without mapping to any genome assembly (like Trinity)?
(2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
(3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?

Thanks

Best
Quanwei

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170127/ee2911fc/attachment.html>

From carsonhh at gmail.com  Fri Jan 27 18:53:10 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 27 Jan 2017 17:53:10 -0700
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
	<4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
	<90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>
Message-ID: <DA117F8A-20D0-4F99-96E5-CFF4FDAB1799@gmail.com>

No. My experience has just been with regular Trinity de novo assembly. Of course, I?d be interested in any one else?s attempt at this though.

?Carson


> On Jan 27, 2017, at 3:21 PM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> 
> Yup I agree.  Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used?  
> 
> chris
> 
> From: maker-devel <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Date: Friday, January 27, 2017 at 10:23 AM
> To: Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>>
> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
> Subject: Re: [maker-devel] transcript assembly of RNA-seq data
> 
>> (1) De novo assembly without mapping to any genome assembly (like Trinity)
>> 
>> You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.
>> 
>> ?Carson
>> 
>> 
>>> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>>> 
>>> Hello: 
>>> 
>>> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. 
>>> (1) De novo assembly without mapping to any genome assembly (like Trinity)?
>>> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
>>> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?
>>> 
>>> Thanks
>>> 
>>> Best
>>> Quanwei
>>>  
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170127/32d7e3a3/attachment.html>

From carsonhh at gmail.com  Sat Jan 28 14:53:45 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Sat, 28 Jan 2017 13:53:45 -0700
Subject: [maker-devel] Maker-Error when started with OpenMPI
In-Reply-To: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
References: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
Message-ID: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>

Try adding one of the following to your mpiexec command ?>

1. --mca btl ^openib
2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0

One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.

--Carson


> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <rainer.rutka at uni-konstanz.de> wrote:
> 
> Hi everybody.
> 
> My name is Rainer. I am an administrator for our HPC-Systems at our
> university in Konstanz, Baden-Wuertemberg/Germany.
> The procect is called bwHPC-C5.
> 
> See: https://www.bwhpc-c5.de/en/index.php
> 
> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
> i get errors while running a Maker job in the MPI-environment.
> 
> BUILD STATUS
> 
> ==============================================================================
> STATUS MAKER v2.31.9
> ==============================================================================
> PERL Dependencies: VERIFIED
> External Programs: VERIFIED
> External C Libraries: VERIFIED
> MPI SUPPORT: ENABLED
> MWAS Web Interface: DISABLED
> MAKER PACKAGE: CONFIGURATION OK
> 
> MODULES / INCLUDES / COMPILERS
> 
> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
> #
> ##### (B) Dependencies:
> #
> # conflict: any other maker version
> # module load compiler/gnu/5.2
> # module load mpi/openmpi/2.0-gnu-5.2
> [...]
> 
> MPI/MOAB SUBMIT
> 
> [...]
> ### Queues ###
> #MSUB -q fat
> #MSUB -l nodes=1:ppn=16
> #MSUB -l mem=20gb
> #MSUB -l walltime=50:00:00
> #
> [...]
> echo " "
> echo "### Loading MAKER module:"
> echo " "
> module load bio/maker/2.31.9
> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
> echo "MAKER_VERSION = $MAKER_VERSION"
> module list
> [...]
> echo " "
> echo "### Runing Maker example"
> echo " "
> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
> export OMPI_MCA_mpi_warn_on_fork=0
> 
> echo "LD_PRELOAD=${LD_PRELOAD}"
> #
> # "STATUS: Processing and indexing input FASTA files..."
> #
> mpiexec -mca btl ^openib -n 16 maker
> [...]
> 
> 
> E R R O R S
> =======
> [...]
> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [uc1n338:113607] *** Process received signal ***
> [uc1n338:113607] Signal: Segmentation fault (11)
> [uc1n338:113607] Signal code: Address not mapped (1)
> [uc1n338:113607] Failing at address: 0x4b0
> [uc1n338:113608] *** Process received signal ***
> [uc1n338:113608] Signal: Segmentation fault (11)
> [uc1n338:113608] Signal code: Address not mapped (1)
> [uc1n338:113608] Failing at address: 0x4b0
> [uc1n338:113621] *** Process received signal ***
> [uc1n338:113621] Signal: Segmentation fault (11)
> [uc1n338:113621] Signal code: Address not mapped (1)
> [uc1n338:113621] Failing at address: 0x4b0
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> [...]
> 
> WHATS WRONG HERE!?
> 
> Thank you for your help!
> 
> All the best ,
> 
> Rainer
> 
> -- 
> Rainer Rutka
> University of Konstanz
> Communication, Information, Media Centre (KIM)
> * High-Performance-Computing (HPC)
> * KIM-Support and -Base-Services
> Room: V511
> 78457 Konstanz, Germany
> +49 7531 88-5413
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From rainer.rutka at uni-konstanz.de  Mon Jan 30 02:32:08 2017
From: rainer.rutka at uni-konstanz.de (Rainer Rutka)
Date: Mon, 30 Jan 2017 09:32:08 +0100
Subject: [maker-devel] Maker-Error when started with OpenMPI
In-Reply-To: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>
References: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
	<73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>
Message-ID: <c89c60e5-1162-1297-5d71-99b1cbf315ec@uni-konstanz.de>

Hi Carson!

Thank you VERY MUCH for your hints.

Much appreciated!

I'll test these today and let you know about the results.

Again: THANKS! :-)

BTW: I'm not a scientist. Only a system operator.

:-)

Am 28.01.2017 um 21:53 schrieb Carson Holt:
> Try adding one of the following to your mpiexec command ?>
> 1. --mca btl ^openib
> 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
> 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
> One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.
> --Carson

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170130/8192bed4/attachment.p7s>

From qwzhang0601 at gmail.com  Tue Jan 31 11:36:13 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 12:36:13 -0500
Subject: [maker-devel] collecting protein sequences as evidences
Message-ID: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>

I wonder what's the best way to collect protein sequences for gene
annotation of a de novo genome assembly.
(1) My first choice is to get protein sequences of human and mouse from
UniProt. At this step, I am not clear whether I should download the
reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e.,
TrEMBL).
(2) On ther other hand, I also get protein sequences from NCBI, should I
just simply merge those fasta files. Does it matter if there are
redundancies? And also, if I get protein sequences from different sources,
they may not have the same quality. Do I need to do something before I
integrate protein sequences from different sources?

Many thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170131/315d4a00/attachment.html>

From qwzhang0601 at gmail.com  Tue Jan 31 13:08:21 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 14:08:21 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from different
	tissues and individuals
Message-ID: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>

Hello:

I am trying to assemble transcripts using RNA-seq data by the tool Trinity,
which will be used for gene annotation for Maker. Now I have data from two
tissues with two replicates each. Should I merge all four samples to get
one assembly file? Or should I merge replicates of each tissue separately
and use the two assembly files as input of Maker. Merging all samples into
one, we will have much higher coverage level, but I think there may be some
genes expressed by tissue-specific isoforms. So I not sure whether I should
merge RNA-seq from different tissues.
What's more, I find some published RNA-seq data from another individual
(and also for different tissue from us) for the same species. Should I
merge all RNA-seq together (across individuals and tissues)? Or should I
generate different transcript assembly and use all those assemblies as
input to Maker?

Thanks
Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170131/66a95fb5/attachment.html>

From michael.s.campbell1 at gmail.com  Tue Jan 31 13:26:29 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Tue, 31 Jan 2017 14:26:29 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from
	different tissues and individuals
In-Reply-To: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
Message-ID: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>

I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.

Example: 
est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta

Good luck,
Mike

> On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
> What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
> 
> Thanks
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From michael.s.campbell1 at gmail.com  Tue Jan 31 14:57:28 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Tue, 31 Jan 2017 15:57:28 -0500
Subject: [maker-devel] collecting protein sequences as evidences
In-Reply-To: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>
References: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>
Message-ID: <2E4D90C9-6D6E-4F52-A361-AFB06A61D2C2@gmail.com>

Hi Quanwei,

(1) When I use uniprot I use SWISS-prot and not tremble.
(2) I don?t merge files together. I just pass them all to MAKER as a comma separated list.

Thanks,
Mike

> On Jan 31, 2017, at 12:36 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. 
> (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). 
> (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? 
> 
> Many thanks
> 
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From cjfields at illinois.edu  Tue Jan 31 15:05:43 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 21:05:43 +0000
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
Message-ID: <A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>

I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.

chris

On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org on behalf of michael.s.campbell1 at gmail.com> wrote:

    I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.
    
    Example: 
    est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
    
    Good luck,
    Mike
    
    > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
    > 
    > Hello:
    > 
    > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
    > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
    > 
    > Thanks
    > Best
    > Quanwei
    > _______________________________________________
    > maker-devel mailing list
    > maker-devel at box290.bluehost.com
    > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
    
    
    _______________________________________________
    maker-devel mailing list
    maker-devel at box290.bluehost.com
    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
    

From mjfi2sb3 at gmail.com  Tue Jan 31 15:14:14 2017
From: mjfi2sb3 at gmail.com (Salim Bougouffa)
Date: Tue, 31 Jan 2017 21:14:14 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
	<357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
Message-ID: <CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>

Hi Christopher,

How would you identify a low confidence transcript? And how do you remove
them? Also, did you try setting a minimum read coverage in Trinity as the
default is one?

Best
/SB

On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, <cjfields at illinois.edu>
wrote:

> If I recall, from a BAM you would need to run a reference-based assembly
> on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use
> Trinity for ref-based assembly.  But I always choose the route of a full de
> novo assembly (again, Trinity or similar) when possible, doing some basic
> cleanup (e.g. remove low confidence transcripts) and bring them as EST
> evidence.
>
> chris
>
> From: maker-devel <maker-devel-bounces at yandell-lab.org> on behalf of
> Scott Cain <scott at scottcain.net>
> Date: Wednesday, January 25, 2017 at 2:23 PM
> To: Maya Britstein <mayabritstein at gmail.com>
> Cc: "maker-devel at yandell-lab.org List" <maker-devel at yandell-lab.org>, "
> help at gmod.org" <help at gmod.org>
> Subject: Re: [maker-devel] GFF3 file format
>
> Hi Maya,
>
> I'm not sure what MAKER's requirements are in this regard--I'm forwarding
> this to their mailing list.
>
> Scott
>
>
> On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com>
> wrote:
>
> Hi,
>
> I have RNA-seq data, and genomic data that I want to annotate using maker.
>
> From what I understood, I need to genarate a gff3 file format from the
> RNA-seq mapping sequences. I had mapped the RNA sequences to the genome
> using bowtie and tophat. However, I still do not know how to take these
> format and convert them to a gff3 file that I can them use in maker as
> annotation evidence
>
> I saw the wiki page, that did not mention how to make this conversion (
> http://gmod.org/wiki/GFF3
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>
> )
>
> Can you please help me?
>
> Sincerely,
> Maya
>
> ----
> Maya Britstein
> Ph.D candidate
> Laura Steindler's Lab
> Marine Biology Department
> Leon H. Charney School of Marine Sciences
> University of Haifa, Israel
>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> GMOD Coordinator (http://gmod.org/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)
>                    216-392-3087
> Ontario Institute for Cancer Research
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-- 

____________________________
Sent from Inbox Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170131/b06e01be/attachment.html>

From qwzhang0601 at gmail.com  Tue Jan 31 15:33:12 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 16:33:12 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
Message-ID: <CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>

Thank you guys for your suggestions. So you do not suggest to use RNA-seq
data from another study, even I assemble them separately and then provide
both assemblies into Maker as a comma separated list. The issues you
mentioned do exist, but some people did collect RNA-seq data from different
individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198).
But thank you for your suggestions, I will think about it.

Best
Quanwei

2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu>:

> I agree with Mike.  I also suggest not combining RNA-Seqs from different
> runs (e.g. different studies) even if they are from the same tissue,
> development stage etc. There are many other factors (biological variation,
> sample quality, sequencing chemistry or technology differences, etc) that
> can significantly and negatively impact trx assembly quality.
>
> chris
>
> On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <
> maker-devel-bounces at yandell-lab.org on behalf of
> michael.s.campbell1 at gmail.com> wrote:
>
>     I would probably try merging the replicates but not the tissues. You
> can then pass the output files to MAKER in a comma separated list in the
> opts file.
>
>     Example:
>     est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
>
>     Good luck,
>     Mike
>
>     > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com>
> wrote:
>     >
>     > Hello:
>     >
>     > I am trying to assemble transcripts using RNA-seq data by the tool
> Trinity, which will be used for gene annotation for Maker. Now I have data
> from two tissues with two replicates each. Should I merge all four samples
> to get one assembly file? Or should I merge replicates of each tissue
> separately and use the two assembly files as input of Maker. Merging all
> samples into one, we will have much higher coverage level, but I think
> there may be some genes expressed by tissue-specific isoforms. So I not
> sure whether I should merge RNA-seq from different tissues.
>     > What's more, I find some published RNA-seq data from another
> individual (and also for different tissue from us) for the same species.
> Should I merge all RNA-seq together (across individuals and tissues)? Or
> should I generate different transcript assembly and use all those
> assemblies as input to Maker?
>     >
>     > Thanks
>     > Best
>     > Quanwei
>     > _______________________________________________
>     > maker-devel mailing list
>     > maker-devel at box290.bluehost.com
>     > http://box290.bluehost.com/mailman/listinfo/maker-devel_
> yandell-lab.org
>
>
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at box290.bluehost.com
>     http://box290.bluehost.com/mailman/listinfo/maker-devel_
> yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170131/e3c2dca5/attachment.html>

From carsonhh at gmail.com  Tue Jan 31 15:35:20 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 31 Jan 2017 14:35:20 -0700
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
	<CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
Message-ID: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>

I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list.

?Carson
 
> On Jan 31, 2017, at 2:33 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it.
> 
> Best
> Quanwei 
> 
> 2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu <mailto:cjfields at illinois.edu>>:
> I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.
> 
> chris
> 
> On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org> on behalf of michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>> wrote:
> 
>     I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.
> 
>     Example:
>     est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
> 
>     Good luck,
>     Mike
> 
>     > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>     >
>     > Hello:
>     >
>     > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
>     > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
>     >
>     > Thanks
>     > Best
>     > Quanwei
>     > _______________________________________________
>     > maker-devel mailing list
>     > maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>     > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>     http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170131/770b0474/attachment.html>

From cjfields at illinois.edu  Tue Jan 31 17:05:43 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 23:05:43 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
	<357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
	<CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>
Message-ID: <8BD384C9-4E46-42AC-A59F-96299EF5E104@illinois.edu>

You can use RSEM for some initial filtering:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts

Then I generally use the Trinity QA steps, in particular TransRate or DETONATE:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment

chris

From: Salim Bougouffa <mjfi2sb3 at gmail.com>
Date: Tuesday, January 31, 2017 at 3:14 PM
To: Chris Fields <cjfields at illinois.edu>, Scott Cain <scott at scottcain.net>, Maya Britstein <mayabritstein at gmail.com>
Cc: "maker-devel at yandell-lab.org List" <maker-devel at yandell-lab.org>, "help at gmod.org" <help at gmod.org>
Subject: Re: [maker-devel] GFF3 file format


Hi Christopher,

How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one?

Best
/SB

On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly.  But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence.

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Scott Cain <scott at scottcain.net<mailto:scott at scottcain.net>>
Date: Wednesday, January 25, 2017 at 2:23 PM
To: Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, "help at gmod.org<mailto:help at gmod.org>" <help at gmod.org<mailto:help at gmod.org>>
Subject: Re: [maker-devel] GFF3 file format

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:
Hi,

I have RNA-seq data, and genomic data that I want to annotate using maker.

From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence

I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>)

Can you please help me?

Sincerely,
Maya

----
Maya Britstein
Ph.D candidate
Laura Steindler's Lab
Marine Biology Department
Leon H. Charney School of Marine Sciences
University of Haifa, Israel


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)                     216-392-3087
Ontario Institute for Cancer Research
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=nDcMZi8LwiKXV-94ieW6tG0bEcaZof1aYjvJpMNjDME&s=kb8B_j9O5us3LoI3siiGDenax1ptk_GUX1LqjlB0S4U&e=>
--

____________________________
Sent from Inbox Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170131/25e09f2b/attachment.html>

From cjfields at illinois.edu  Tue Jan 31 17:07:44 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 23:07:44 +0000
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
	<CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
	<656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>
Message-ID: <CAE4C80D-DD8F-4A2C-A33B-535456D233AE@illinois.edu>

Exactly

chris

From: Carson Holt <carsonhh at gmail.com>
Date: Tuesday, January 31, 2017 at 3:35 PM
To: Quanwei Zhang <qwzhang0601 at gmail.com>
Cc: Chris Fields <cjfields at illinois.edu>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals

I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list.

?Carson

On Jan 31, 2017, at 2:33 PM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:

Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it.
Best
Quanwei

2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>:
I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.

chris

On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org> on behalf of michael.s.campbell1 at gmail.com<mailto:michael.s.campbell1 at gmail.com>> wrote:

    I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.

    Example:
    est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta

    Good luck,
    Mike

    > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:
    >
    > Hello:
    >
    > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
    > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
    >
    > Thanks
    > Best
    > Quanwei
    > _______________________________________________
    > maker-devel mailing list
    > maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
    > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=TbJJJYslHBwbE9FJ8HEVNbD0E2OzS2-euK0lFIQMDgI&s=z4OxAc1Ttw7Hvqdr-PWOdGLMmQpnjXTOXfv-mUkoCJg&e=>


    _______________________________________________
    maker-devel mailing list
    maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=TbJJJYslHBwbE9FJ8HEVNbD0E2OzS2-euK0lFIQMDgI&s=z4OxAc1Ttw7Hvqdr-PWOdGLMmQpnjXTOXfv-mUkoCJg&e=>


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20170131/c78c9df7/attachment.html>

From rob.syme at gmail.com  Wed Jan  4 23:41:25 2017
From: rob.syme at gmail.com (Rob Syme)
Date: Thu, 05 Jan 2017 06:41:25 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
Message-ID: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>

Hi all

The MAKER wiki page "Repeat Library Construction - Advanced
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>"
describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded
MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there.
Are they distributed with MAKER or separately. Does anybody know where to
find them?

Thanks!

Rob Syme
Research Associate
Curtin University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/299fabc3/attachment-0001.html>

From olegl at volcani.agri.gov.il  Thu Jan  5 03:07:31 2017
From: olegl at volcani.agri.gov.il (Oleg Lovky)
Date: Thu, 5 Jan 2017 10:07:31 +0000
Subject: [maker-devel] Unable to train SNAP
Message-ID: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>

Hello,

I'm running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000).
However, I'm not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file.
Furthermore, when trying to use maker2zff I'm getting empty genome.ann and genome.dna files.

Please advise.

Regards,

Oleg Lovky, MSc.
Research Engineer
Institute of Plant Sciences
ARO, Volcani Center
Cell: 054-4870319
[v95_15]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/90c174a8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 16191 bytes
Desc: image001.png
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/90c174a8/attachment-0001.png>

From michael.s.campbell1 at gmail.com  Thu Jan  5 07:54:17 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Thu, 5 Jan 2017 09:54:17 -0500
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <3B3F80CA-BFA1-4F0E-A2F1-CA60E8496D5F@gmail.com>

Hi Rob,

There is a link near the bottom of that wiki page at the end of this line

"CRL and other custom scripts are available here.?

That points to this URL http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz <http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz>

Thanks,
Mike
> On Jan 5, 2017, at 1:41 AM, Rob Syme <rob.syme at gmail.com> wrote:
> 
> Hi all
> 
> The MAKER wiki page "Repeat Library Construction - Advanced <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them?
> 
> Thanks!
> 
> Rob Syme
> Research Associate
> Curtin University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/220eecad/attachment-0001.html>

From rob.syme at gmail.com  Thu Jan  5 18:29:35 2017
From: rob.syme at gmail.com (Rob Syme)
Date: Fri, 06 Jan 2017 01:29:35 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <CAEf4xgcZXf18ZWD9JusvrsyUdLCg_wOe2SuA2d91mnTcug+u1w@mail.gmail.com>

Oh dear. That's embarrassing for me! Sorry for the silly question.

-r

On Thu, 5 Jan 2017 at 14:41 Rob Syme <rob.syme at gmail.com> wrote:

> Hi all
>
> The MAKER wiki page "Repeat Library Construction - Advanced
> <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>"
> describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded
> MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there.
> Are they distributed with MAKER or separately. Does anybody know where to
> find them?
>
> Thanks!
>
> Rob Syme
> Research Associate
> Curtin University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/ad9453c6/attachment-0001.html>

From xvazquezc at gmail.com  Thu Jan  5 19:23:17 2017
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Fri, 6 Jan 2017 13:23:17 +1100
Subject: [maker-devel] Unable to train SNAP
In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
Message-ID: <CAL0hg4EEQk5CWkrni6-o29m_mAOkYjLKqjA8Df04FKJMbfDB8g@mail.gmail.com>

Are you using the -n option with maker2zff? You often get empty genome.ann
and genome.dna files if you don't.

On 5 January 2017 at 21:07, Oleg Lovky <olegl at volcani.agri.gov.il> wrote:

> Hello,
>
>
>
> I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1)
> containing ~50k reads (length ranges from 70 to 12000).
>
> However, I?m not getting transcript and proteins fasta files at all,
> despite Maker not giving any errors and everything is listed as finished in
> the datastore log file.
>
> Furthermore, when trying to use maker2zff I?m getting empty genome.ann and
> genome.dna files.
>
>
>
> Please advise.
>
>
>
> Regards,
>
>
>
> Oleg Lovky, MSc.
>
> Research Engineer
>
> Institute of Plant Sciences
>
> ARO, Volcani Center
>
> Cell: 054-4870319
>
> [image: v95_15]
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/7dceb9af/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 16191 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/7dceb9af/attachment-0001.png>

From carsonhh at gmail.com  Fri Jan  6 12:28:02 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 6 Jan 2017 12:28:02 -0700
Subject: [maker-devel] Unable to train SNAP
In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
Message-ID: <8F65E561-7450-4B5A-8F1B-4E51C0D25BE2@gmail.com>

The maker2zff script has a number of thresholds that must be reached to avoid filtering all models. If you don?t have protein evidence in the dataset for example, then that filter may always be failing. You may just want to turn all filters off with the -n option as previously suggested.

?Carson


> On Jan 5, 2017, at 3:07 AM, Oleg Lovky <olegl at volcani.agri.gov.il> wrote:
> 
> Hello,
>  
> I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000).
> However, I?m not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file.
> Furthermore, when trying to use maker2zff I?m getting empty genome.ann and genome.dna files.
>  
> Please advise.
>  
> Regards,
>  
> Oleg Lovky, MSc.
> Research Engineer
> Institute of Plant Sciences
> ARO, Volcani Center
> Cell: 054-4870319
> <image001.png>
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/823f4e75/attachment-0001.html>

From kchilds at msu.edu  Thu Jan  5 07:28:00 2017
From: kchilds at msu.edu (Childs, Kevin)
Date: Thu, 5 Jan 2017 14:28:00 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <6AE4044B-9011-4421-A6F1-FE3B95BBB11D@msu.edu>

Rob,

The scripts can be found in a link at the bottom of this wiki page:

http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced

Kevin Childs

---
Kevin Childs, PhD

Assistant Professor - Fixed Term
Center for Genomics-Enabled Plant Science
Plant Biology Department
Michigan State University

kchilds at msu.edu
517-775-2844 (m)
517-884-6926 (o)

http://childslab.plantbiology.msu.edu


> On Jan 5, 2017, at 1:41 AM, Rob Syme <rob.syme at gmail.com> wrote:
> 
> Hi all
> 
> The MAKER wiki page "Repeat Library Construction - Advanced" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them?
> 
> Thanks!
> 
> Rob Syme
> Research Associate
> Curtin University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Fri Jan  6 18:22:10 2017
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Fri, 6 Jan 2017 20:22:10 -0500
Subject: [maker-devel] /tmp full
Message-ID: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>

Hi all,

Maker keeps filling up the /tmp directories on the cluster I am using. It
appears that most of the space is taken with many versions of various blast
databases. I suspect that this issue is partly due to my not using MPI and
instead launching multiple instances of maker (typically 16) in the same
working directory. However, it appears that maker is also leaving some of
these databases in /tmp even after it has died or been killed and they are
piling up.

I am submitting my jobs to the cluster via SLURM but have installed maker
locally rather than system-wide. My system administrator is going to try
creating a larger locally mounted directory on some of the nodes for me but
I wanted to check to see if you have any other suggestions to solve the
issue or make sure that maker cleans up /tmp as aggressively as possible.

I am using maker3-beta.

Thanks for any help,
Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/3fb552ff/attachment-0001.html>

From carsonhh at gmail.com  Sat Jan  7 16:29:29 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Sat, 7 Jan 2017 16:29:29 -0700
Subject: [maker-devel] /tmp full
In-Reply-To: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
References: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
Message-ID: <DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>

If you use the MPI settings, then all processes will share a single temporary directory, otherwise they each will have a separate one since they can?t intercommunicate.

MAKER tries to cleanup its files on finish or failure, but if you or the system kill it with certain signals, then it is reaped immediately by the system and not allowed to finish cleaning up. Signals 9 and 19 for example will do that. If a failure is related to the drive being full or a memory issue, then your system may be hitting it with one of these uncatchable signals. For example SLURM may use signal 9 or 19 if a process fails to respond to signal 15 in a timely manner (i.e. MAKER may be removing files, but SLURM gets impatient and kills it more aggressively because it thinks the process is not responding). You can always try and empty /tmp as the first step in your batch script, and it will remove files belonging to you before launching MAKER.

?Carson


> On Jan 6, 2017, at 6:22 PM, Benjamin Rubin <brubin at fieldmuseum.org> wrote:
> 
> Hi all,
> 
> Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. 
> 
> I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible.
> 
> I am using maker3-beta.
> 
> Thanks for any help,
> Ben
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Sun Jan  8 09:24:36 2017
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Sun, 8 Jan 2017 11:24:36 -0500
Subject: [maker-devel] /tmp full
In-Reply-To: <DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>
References: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
	<DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>
Message-ID: <CAKpVPBLfiYakZ3Ce2q02gYXatJHJzJ8dW-YMgscg9Nm6-KT03w@mail.gmail.com>

OK, thanks for the tips. Knowing the particulars of how SLURM might be
causing this is extremely helpful. I'll try to just empty /tmp before
running MAKER on each node, as you suggest. I suspect that will work but
will work on getting MPI running as well.

Thanks!
Ben

On Sat, Jan 7, 2017 at 6:29 PM, Carson Holt <carsonhh at gmail.com> wrote:

> If you use the MPI settings, then all processes will share a single
> temporary directory, otherwise they each will have a separate one since
> they can?t intercommunicate.
>
> MAKER tries to cleanup its files on finish or failure, but if you or the
> system kill it with certain signals, then it is reaped immediately by the
> system and not allowed to finish cleaning up. Signals 9 and 19 for example
> will do that. If a failure is related to the drive being full or a memory
> issue, then your system may be hitting it with one of these uncatchable
> signals. For example SLURM may use signal 9 or 19 if a process fails to
> respond to signal 15 in a timely manner (i.e. MAKER may be removing files,
> but SLURM gets impatient and kills it more aggressively because it thinks
> the process is not responding). You can always try and empty /tmp as the
> first step in your batch script, and it will remove files belonging to you
> before launching MAKER.
>
> ?Carson
>
>
>
>
> > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin <brubin at fieldmuseum.org>
> wrote:
> >
> > Hi all,
> >
> > Maker keeps filling up the /tmp directories on the cluster I am using.
> It appears that most of the space is taken with many versions of various
> blast databases. I suspect that this issue is partly due to my not using
> MPI and instead launching multiple instances of maker (typically 16) in the
> same working directory. However, it appears that maker is also leaving some
> of these databases in /tmp even after it has died or been killed and they
> are piling up.
> >
> > I am submitting my jobs to the cluster via SLURM but have installed
> maker locally rather than system-wide. My system administrator is going to
> try creating a larger locally mounted directory on some of the nodes for me
> but I wanted to check to see if you have any other suggestions to solve the
> issue or make sure that maker cleans up /tmp as aggressively as possible.
> >
> > I am using maker3-beta.
> >
> > Thanks for any help,
> > Ben
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
_____________________________________________________
Benjamin ER Rubin, PhD
Committee on Evolutionary Biology
University of Chicago
benrubin.org

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170108/e4efa4cc/attachment-0001.html>

From lmainzer at life.illinois.edu  Mon Jan  9 00:02:01 2017
From: lmainzer at life.illinois.edu (Liudmila Sergeevna Mainzer)
Date: Mon, 9 Jan 2017 01:02:01 -0600
Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names
Message-ID: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>

Hello, MAKER developers!

I tried submitting this bug report through the web form on the 
RepeatMasker web page, but I am getting an "invalid submission" message, 
so I decided to post here.

I found a weird bug that results in the notorious "index out of bounds" 
error reported by RepeatMasker. Significantly, this error only arises on 
very long file names generated by MAKER.

I traced this through the code, and identified the error to originate in 
Tandem Repeat finder. TRF sometimes splits up its output into separate 
files. When that happens, the pieces with index >1 do not contain the 
sequence name. Compare the first few lines between these two files:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html
 
<HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 


     bgcolor="#File 1 of 2 FBF8BC"><PRE>
     Tandem Repeats Finder Program written by:
                   Gary Benson
                   Program in Bioinformatics
                   Boston University
     Version 4.09
     Sequence: InputSequencefrag-1 CHUNK number:191 <http://number:191>
     size:455659 <http://size:455659> offset:57300000
     <http://offset:57300000>
     Parameters: 2 3 5 75 20 33 7

etcetera
But also the second chunk:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
 
<HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 


     bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
     <A NAME="56278--56322,1,45.0,1,1136"></A><A
     HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
     <http://tandem.bu.edu/trf/trf.definitions.html#alignment> target
     ="explanation">Alignment explanation</A><BR><BR>
        Indices: 56278--56322  Score: 55
        Period size: 1  Copynumber: 45.0  Consensus size: 1

etcetera


See how one file has the full header with the "Sequence:" statement and 
the other one does not? This "Sequence:" statement is used in the 
RepeatMasker code to name each piece of sequence that ends up being 
masked later. When this variable if empty (the name string is not 
defined), the setSubstr subroutine in the main RepeatMasker code breaks: 
length of an undefined string is of course zero, and that subroutine has 
a check for sequences whose length is shorter than the region that needs 
to be masked.

So it quits with the statement "Error index out of bounds!", even though 
the sequence is finite length, does not have any weird characters, and 
is maskable.

Once again, this only arises on very long file names, and those seem to 
be created by MAKER. Example:
LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific

Notice how the last part of the file name has a bunch of identifiers 
separated by the %2E (generic URI-encoding)? I experimented with that 
file name. The path does not matter. The % signs do not matter. It is 
the length of the filename itself: if it is <108 characters, then 
RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like 
maybe Perl is not handling that long a name very well...

So the problem is three-fold: MAKER creates file names that are 
very-very long, while RepeatMasker breaks due to TRF failing to write 
the file headers properly for those very long file names.

Would you provide any suggestions or patches for this problem? It is 
forcing us to run RepeatMasker separately, outside the main MAKER 
worlflow, which really complicates the data management and analysis as a 
whole.
We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl 
v5.10.1 built for x86_64-linux-thread-multi.

Many thanks in advance,
Liudmila Mainzer

----------------
Senior Research Scientist
National Center for Supercomputing Applications

Research Assistant Professor
Institute of Genomic Biology

University of Illinois
217-300-0568
1205 W. Clark St. Room 4026
Urbana, IL 61801


From carsonhh at gmail.com  Mon Jan  9 09:30:09 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 9 Jan 2017 09:30:09 -0700
Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names
In-Reply-To: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>
References: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>
Message-ID: <733D5263-6CFC-4AB3-BFDD-30330B0E1985@gmail.com>

The name used by maker is based off of the input file name, so quick fix would just be to rename your input file to have a shorter name.

?Carson


> On Jan 9, 2017, at 12:02 AM, Liudmila Sergeevna Mainzer <lmainzer at life.illinois.edu> wrote:
> 
> Hello, MAKER developers!
> 
> I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here.
> 
> I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER.
> 
> I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html
> <HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 
> 
>    bgcolor="#File 1 of 2 FBF8BC"><PRE>
>    Tandem Repeats Finder Program written by:
>                  Gary Benson
>                  Program in Bioinformatics
>                  Boston University
>    Version 4.09
>    Sequence: InputSequencefrag-1 CHUNK number:191 <http://number:191>
>    size:455659 <http://size:455659> offset:57300000
>    <http://offset:57300000>
>    Parameters: 2 3 5 75 20 33 7
> 
> etcetera
> But also the second chunk:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
> <HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 
> 
>    bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
>    <A NAME="56278--56322,1,45.0,1,1136"></A><A
>    HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
>    <http://tandem.bu.edu/trf/trf.definitions.html#alignment> target
>    ="explanation">Alignment explanation</A><BR><BR>
>       Indices: 56278--56322  Score: 55
>       Period size: 1  Copynumber: 45.0  Consensus size: 1
> 
> etcetera
> 
> 
> See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked.
> 
> So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable.
> 
> Once again, this only arises on very long file names, and those seem to be created by MAKER. Example:
> LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific
> 
> Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well...
> 
> So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names.
> 
> Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole.
> We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi.
> 
> Many thanks in advance,
> Liudmila Mainzer
> 
> ----------------
> Senior Research Scientist
> National Center for Supercomputing Applications
> 
> Research Assistant Professor
> Institute of Genomic Biology
> 
> University of Illinois
> 217-300-0568
> 1205 W. Clark St. Room 4026
> Urbana, IL 61801
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From qlian003 at ucr.edu  Wed Jan 11 22:28:32 2017
From: qlian003 at ucr.edu (Qihua Liang)
Date: Wed, 11 Jan 2017 21:28:32 -0800
Subject: [maker-devel] gff file: possible sources
Message-ID: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>

Hi Maker develop team!

I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as,

Possible Sources Include:
BLASTN - BLASTN alignment of EST evidence
BLASTX - BLASTX alignment of protein evidence
TBLASTX - TBLASTX alignment of EST evidence from closely related organisms
EST2Genome - Polished EST alignment from Exonerate
Protein2Genome - Polished protein alignment from Exonerate
SNAP - SNAP ab inito gene prediction
GENEMARK - GeneMarkab inito gene prediction
Augustus - Augustus ab inito gene prediction
FgenesH - FGENESH ab inito gene prediction
Repeatmasker - RepeatMasker identified repeat
RepeatRunner - RepeatRunner identified repeat from the repeat protein database
tRNAScan - tRNAScan-SE tRNA predictions (coming soon)
PASA - PASA gene predictions (coming soon)

There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above?

Thanks
Qihua

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170111/bc9a45df/attachment-0001.html>

From dence at genetics.utah.edu  Thu Jan 12 06:28:24 2017
From: dence at genetics.utah.edu (Daniel Ence)
Date: Thu, 12 Jan 2017 13:28:24 +0000
Subject: [maker-devel] gff file: possible sources
In-Reply-To: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>
References: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>
Message-ID: <DE48F3CB-8B72-43A6-8331-ED1B811CDCCE@genetics.utah.edu>

Hi Qihua, the cdna2genome is the polished tblastx alignments from Exonerate. Basically, the source column should be the name of the tool that generated the alignment, prediction, or gene model.

~Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

On Jan 11, 2017, at 11:28 PM, Qihua Liang <qlian003 at ucr.edu<mailto:qlian003 at ucr.edu>> wrote:

Hi Maker develop team!

I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as,

Possible Sources Include:

  *   BLASTN - BLASTN alignment of EST evidence
  *   BLASTX - BLASTX alignment of protein evidence
  *   TBLASTX - TBLASTX alignment of EST evidence from closely related organisms
  *   EST2Genome - Polished EST alignment from Exonerate
  *   Protein2Genome - Polished protein alignment from Exonerate
  *   SNAP - SNAP ab inito gene prediction
  *   GENEMARK - GeneMarkab inito gene prediction
  *   Augustus - Augustus ab inito gene prediction
  *   FgenesH - FGENESH ab inito gene prediction
  *   Repeatmasker - RepeatMasker identified repeat
  *   RepeatRunner - RepeatRunner identified repeat from the repeat protein database
  *   tRNAScan - tRNAScan-SE tRNA predictions (coming soon)
  *   PASA - PASA gene predictions (coming soon)

There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above?

Thanks
Qihua

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170112/86ed58bb/attachment-0001.html>

From patel.kumar.vipul at gmail.com  Fri Jan 20 01:44:26 2017
From: patel.kumar.vipul at gmail.com (Vipul Patel)
Date: Fri, 20 Jan 2017 09:44:26 +0100
Subject: [maker-devel] Maker crash for long chrm.
Message-ID: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>

Hi,

I hope someone can help me to figure out what is actually going wrong.

I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP
variable not to use NFS. The given testcase as well for 1k<small contigs <
1MB runs without any problems.

Applying it to a sequence, for example with 57MB it failes, I tried it as
well with a different sequences around 60MB, same outcome.

I looked into the logs, but it was not really helpful as it was just stated
that the job failed

It crashed with following message:

deleted:0 genes
substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Calling translate without a seq argument!
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447
STACK: Bio::Tools::CodonTable::translate
/usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419
STACK: CGL::TranslationMachine::longest_translation_plus_stop
programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
STACK: maker::auto_annotator::get_translation_seq
programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236
STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/
snap.pm:974
STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/
snap.pm:690
STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
STACK: Process::MpiChunk::_go
programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469
STACK: Process::MpiChunk::run
programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341
STACK: programs/maker/maker/bin/maker:979
-----------------------------------------------------------
--> rank=16, hostname=dummy
ERROR: Failed while gathering ab-init output files
ERROR: Chunk failed at level:1, tier_type:2
FAILED CONTIG:chr_test

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:chr_test

examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...

I got the same message if I run it without MPI, So I can guess it is not an
MPI issue.
How can I find out if some jobs died so maybe this could lead to this
problem?
Other ideas how I can tackle this problem?

Kind regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/d0c6f874/attachment-0001.html>

From patel.kumar.vipul at gmail.com  Fri Jan 20 06:34:28 2017
From: patel.kumar.vipul at gmail.com (Vipul Patel)
Date: Fri, 20 Jan 2017 14:34:28 +0100
Subject: [maker-devel] Maker crash for long chrm.
In-Reply-To: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
References: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
Message-ID: <CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>

Solved. After some digging and printing I found out the problem.

It was snap itself!

For anybody who maybe runs in the  same problem, check snap. Apparently it
was not correctly compiled and therefore it produced a not conform output!
Recompiling solved my issue.

Kind regards

2017-01-20 9:44 GMT+01:00 Vipul Patel <patel.kumar.vipul at gmail.com>:

> Hi,
>
> I hope someone can help me to figure out what is actually going wrong.
>
> I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP
> variable not to use NFS. The given testcase as well for 1k<small contigs <
> 1MB runs without any problems.
>
> Applying it to a sequence, for example with 57MB it failes, I tried it as
> well with a different sequences around 60MB, same outcome.
>
> I looked into the logs, but it was not really helpful as it was just
> stated that the job failed
>
> It crashed with following message:
>
> deleted:0 genes
> substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Calling translate without a seq argument!
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/
> Bio/Root/Root.pm:447
> STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/
> Bio/Tools/CodonTable.pm:419
> STACK: CGL::TranslationMachine::longest_translation_plus_stop
> programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
> STACK: maker::auto_annotator::get_translation_seq
> programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236
> STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../
> lib/Widget/snap.pm:974
> STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/
> snap.pm:690
> STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
> STACK: Process::MpiChunk::_go programs/maker/maker/bin/../
> lib/Process/MpiChunk.pm:1469
> STACK: Process::MpiChunk::run programs/maker/maker/bin/../
> lib/Process/MpiChunk.pm:341
> STACK: programs/maker/maker/bin/maker:979
> -----------------------------------------------------------
> --> rank=16, hostname=dummy
> ERROR: Failed while gathering ab-init output files
> ERROR: Chunk failed at level:1, tier_type:2
> FAILED CONTIG:chr_test
>
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:chr_test
>
> examining contents of the fasta file and run log
>
>
>
> --Next Contig--
>
> Processing run.log file...
>
> I got the same message if I run it without MPI, So I can guess it is not
> an MPI issue.
> How can I find out if some jobs died so maybe this could lead to this
> problem?
> Other ideas how I can tackle this problem?
>
> Kind regards
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/9e92c0fc/attachment-0001.html>

From carsonhh at gmail.com  Fri Jan 20 15:00:49 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 20 Jan 2017 15:00:49 -0700
Subject: [maker-devel] Maker crash for long chrm.
In-Reply-To: <CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>
References: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
	<CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>
Message-ID: <59841676-741F-496D-9E47-7750417033A4@gmail.com>

I?m glad it?s working for you. Let us know if anything else comes up.

?Carson

> On Jan 20, 2017, at 6:34 AM, Vipul Patel <patel.kumar.vipul at gmail.com> wrote:
> 
> Solved. After some digging and printing I found out the problem.
> 
> It was snap itself!
> 
> For anybody who maybe runs in the  same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. 
> 
> Kind regards
> 
> 2017-01-20 9:44 GMT+01:00 Vipul Patel <patel.kumar.vipul at gmail.com <mailto:patel.kumar.vipul at gmail.com>>:
> Hi,
> 
> I hope someone can help me to figure out what is actually going wrong. 
> 
> I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k<small contigs < 1MB runs without any problems. 
> 
> Applying it to a sequence, for example with 57MB it failes, I tried it as well with a different sequences around 60MB, same outcome. 
> 
> I looked into the logs, but it was not really helpful as it was just stated that the job failed
> 
> It crashed with following message:
> 
> deleted:0 genes
> substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Calling translate without a seq argument!
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447
> STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419
> STACK: CGL::TranslationMachine::longest_translation_plus_stop programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
> STACK: maker::auto_annotator::get_translation_seq programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 <http://auto_annotator.pm:3236/>
> STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/snap.pm:974 <http://snap.pm:974/>
> STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/snap.pm:690 <http://snap.pm:690/>
> STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
> STACK: Process::MpiChunk::_go programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469
> STACK: Process::MpiChunk::run programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341
> STACK: programs/maker/maker/bin/maker:979
> -----------------------------------------------------------
> --> rank=16, hostname=dummy
> ERROR: Failed while gathering ab-init output files
> ERROR: Chunk failed at level:1, tier_type:2
> FAILED CONTIG:chr_test
> 
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:chr_test
> 
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> 
> I got the same message if I run it without MPI, So I can guess it is not an MPI issue. 
> How can I find out if some jobs died so maybe this could lead to this problem?
> Other ideas how I can tackle this problem?
> 
> Kind regards
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/c26f37b6/attachment-0001.html>

From mayabritstein at gmail.com  Mon Jan 23 01:30:40 2017
From: mayabritstein at gmail.com (Maya Britstein)
Date: Mon, 23 Jan 2017 10:30:40 +0200
Subject: [maker-devel] Authorization failed.
Message-ID: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>

Hi,

I can't access the maker-devel archives. I am entering my email, and what I
think is my password, but still it doesn't work.

thanks,

Maya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170123/1d817b70/attachment-0001.html>

From bmoore at genetics.utah.edu  Mon Jan 23 05:43:53 2017
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Mon, 23 Jan 2017 12:43:53 +0000
Subject: [maker-devel] Authorization failed.
In-Reply-To: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>
References: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>
Message-ID: <E0148C3A-ACD6-49B2-A39C-C8393D0E9CEA@genetics.utah.edu>

Hi Maya,

If you follow the link below you will find at the bottom of the page a portion of the form that allows you to reset your password.  It?s a little misleading because it looks like it?s only an ?Unsubscribe? option, but it also takes you to a page that allows you to update your subscription details including password reminder/reset.  The actual text for the portion of the page you?re looking for is this:

'To unsubscribe from maker-devel, get a password reminder, or change your subscription options enter your subscription email address:'

The linke is:

http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Regards,

Barry

On Jan 23, 2017, at 1:30 AM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:

Hi,

I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work.

thanks,

Maya
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170123/c4c9f1fb/attachment-0001.html>

From daren.card at gmail.com  Tue Jan 24 07:06:22 2017
From: daren.card at gmail.com (Daren C. Card)
Date: Tue, 24 Jan 2017 08:06:22 -0600
Subject: [maker-devel] Maker error: Invalid nucleotide
Message-ID: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>

Hi everyone,

I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig:

?
Widget::augustus:
/opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus
#-------------------------------#

/opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
	Invalid nucleotide '8' encountered.


/opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
	Invalid nucleotide '8' encountered.

ERROR: Augustus failed
--> rank=7, hostname=moonunit0
ERROR: Failed while preparing ab-inits
ERROR: Chunk failed at level:0, tier_type:2
FAILED CONTIG:scaffold-92

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scaffold-92

examining contents of the fasta file and run log
?

Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues.

Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary.

Thanks,
Daren Card

UT-Arlington


From carsonhh at gmail.com  Wed Jan 25 10:37:50 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 25 Jan 2017 10:37:50 -0700
Subject: [maker-devel] Maker error: Invalid nucleotide
In-Reply-To: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>
References: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>
Message-ID: <5E13AB7E-9175-4440-AD62-A53BD9DD8DE1@gmail.com>

Try running the contig in question (scaffold-92) as a separate MAKER run. That may haelp indicate if the issue may be a corrupt intermediate file (if it is, you can set clean_try=1 to force deletion of intermediate files before rerun).

?Carson


> On Jan 24, 2017, at 7:06 AM, Daren C. Card <daren.card at gmail.com> wrote:
> 
> Hi everyone,
> 
> I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig:
> 
> ?
> Widget::augustus:
> /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus
> #-------------------------------#
> 
> /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
> 	Invalid nucleotide '8' encountered.
> 
> 
> /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
> 	Invalid nucleotide '8' encountered.
> 
> ERROR: Augustus failed
> --> rank=7, hostname=moonunit0
> ERROR: Failed while preparing ab-inits
> ERROR: Chunk failed at level:0, tier_type:2
> FAILED CONTIG:scaffold-92
> 
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:scaffold-92
> 
> examining contents of the fasta file and run log
> ?
> 
> Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues.
> 
> Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary.
> 
> Thanks,
> Daren Card
> 
> UT-Arlington
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From scott at scottcain.net  Wed Jan 25 13:23:02 2017
From: scott at scottcain.net (Scott Cain)
Date: Wed, 25 Jan 2017 15:23:02 -0500
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
Message-ID: <CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding
this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com>
wrote:

> Hi,
>
> I have RNA-seq data, and genomic data that I want to annotate using maker.
>
> From what I understood, I need to genarate a gff3 file format from the
> RNA-seq mapping sequences. I had mapped the RNA sequences to the genome
> using bowtie and tophat. However, I still do not know how to take these
> format and convert them to a gff3 file that I can them use in maker as
> annotation evidence
>
> I saw the wiki page, that did not mention how to make this conversion (
> http://gmod.org/wiki/GFF3)
>
> Can you please help me?
>
> Sincerely,
> Maya
>
> ----
> Maya Britstein
> Ph.D candidate
> Laura Steindler's Lab
> Marine Biology Department
> Leon H. Charney School of Marine Sciences
> University of Haifa, Israel
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170125/272d299a/attachment-0001.html>

From cjfields at illinois.edu  Wed Jan 25 15:03:51 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 25 Jan 2017 22:03:51 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
Message-ID: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>

If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly.  But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence.

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Scott Cain <scott at scottcain.net<mailto:scott at scottcain.net>>
Date: Wednesday, January 25, 2017 at 2:23 PM
To: Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, "help at gmod.org<mailto:help at gmod.org>" <help at gmod.org<mailto:help at gmod.org>>
Subject: Re: [maker-devel] GFF3 file format

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:
Hi,

I have RNA-seq data, and genomic data that I want to annotate using maker.

From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence

I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>)

Can you please help me?

Sincerely,
Maya

----
Maya Britstein
Ph.D candidate
Laura Steindler's Lab
Marine Biology Department
Leon H. Charney School of Marine Sciences
University of Haifa, Israel


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)                     216-392-3087
Ontario Institute for Cancer Research


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170125/57e6cafc/attachment-0001.html>

From qwzhang0601 at gmail.com  Thu Jan 26 13:26:42 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Thu, 26 Jan 2017 15:26:42 -0500
Subject: [maker-devel] canonical protein sequences or isoform?
Message-ID: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>

Hello:

I am doing annotation on a new genome and collecting proteins from mouse. I
found there are both canonical protein sequences and isoforms. I wonder
whether I should use only cannonical protein sequences or both the
canonical and isoforms?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170126/a8b37ec7/attachment-0001.html>

From rainer.rutka at uni-konstanz.de  Fri Jan 27 03:31:40 2017
From: rainer.rutka at uni-konstanz.de (Rainer Rutka)
Date: Fri, 27 Jan 2017 11:31:40 +0100
Subject: [maker-devel] Maker-Error when started with OpenMPI
Message-ID: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>

Hi everybody.

My name is Rainer. I am an administrator for our HPC-Systems at our
university in Konstanz, Baden-Wuertemberg/Germany.
The procect is called bwHPC-C5.

See: https://www.bwhpc-c5.de/en/index.php

I try to get Maker running on our bwUniCluster since weeks. Unfortunately
i get errors while running a Maker job in the MPI-environment.

BUILD STATUS

==============================================================================
STATUS MAKER v2.31.9
==============================================================================
PERL Dependencies: VERIFIED
External Programs: VERIFIED
External C Libraries: VERIFIED
MPI SUPPORT: ENABLED
MWAS Web Interface: DISABLED
MAKER PACKAGE: CONFIGURATION OK

MODULES / INCLUDES / COMPILERS

# knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
#
##### (B) Dependencies:
#
# conflict: any other maker version
# module load compiler/gnu/5.2
# module load mpi/openmpi/2.0-gnu-5.2
[...]

MPI/MOAB SUBMIT

[...]
### Queues ###
#MSUB -q fat
#MSUB -l nodes=1:ppn=16
#MSUB -l mem=20gb
#MSUB -l walltime=50:00:00
#
[...]
echo " "
echo "### Loading MAKER module:"
echo " "
module load bio/maker/2.31.9
[ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 
'bio/maker/2.31.9'."; exit 1; }
echo "MAKER_VERSION = $MAKER_VERSION"
module list
[...]
echo " "
echo "### Runing Maker example"
echo " "
export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
export OMPI_MCA_mpi_warn_on_fork=0

echo "LD_PRELOAD=${LD_PRELOAD}"
#
# "STATUS: Processing and indexing input FASTA files..."
#
mpiexec -mca btl ^openib -n 16 maker
[...]


E R R O R S
=======
[...]
LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[uc1n338:113607] *** Process received signal ***
[uc1n338:113607] Signal: Segmentation fault (11)
[uc1n338:113607] Signal code: Address not mapped (1)
[uc1n338:113607] Failing at address: 0x4b0
[uc1n338:113608] *** Process received signal ***
[uc1n338:113608] Signal: Segmentation fault (11)
[uc1n338:113608] Signal code: Address not mapped (1)
[uc1n338:113608] Failing at address: 0x4b0
[uc1n338:113621] *** Process received signal ***
[uc1n338:113621] Signal: Segmentation fault (11)
[uc1n338:113621] Signal code: Address not mapped (1)
[uc1n338:113621] Failing at address: 0x4b0
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[...]

WHATS WRONG HERE!?

Thank you for your help!

All the best ,

Rainer

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/44fc3eb4/attachment-0001.p7s>

From michael.s.campbell1 at gmail.com  Fri Jan 27 08:36:11 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Fri, 27 Jan 2017 10:36:11 -0500
Subject: [maker-devel] canonical protein sequences or isoform?
In-Reply-To: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>
References: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>
Message-ID: <C9A931ED-273F-4B67-B9C2-32C86166312C@gmail.com>

I give MAKER all isoforms as evidence.

Mike
> On Jan 26, 2017, at 3:26 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms?
> 
> Thanks
> 
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From qwzhang0601 at gmail.com  Fri Jan 27 09:13:22 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Fri, 27 Jan 2017 11:13:22 -0500
Subject: [maker-devel] transcript assembly of RNA-seq data
Message-ID: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>

Hello:

I wonder which is the best way to make use of RNA-seq data for gene
annotation of a new genome assembly.
(1) De novo assembly without mapping to any genome assembly (like Trinity)?
(2) TopHat+Cufflink do mapping to the new genome assembly, that want to
annotate?
(3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or
human)?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/b910c88d/attachment-0001.html>

From carsonhh at gmail.com  Fri Jan 27 09:23:40 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 27 Jan 2017 09:23:40 -0700
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
Message-ID: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>

(1) De novo assembly without mapping to any genome assembly (like Trinity)

You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.

?Carson


> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello: 
> 
> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. 
> (1) De novo assembly without mapping to any genome assembly (like Trinity)?
> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?
> 
> Thanks
> 
> Best
> Quanwei
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/56300e39/attachment-0001.html>

From cjfields at illinois.edu  Fri Jan 27 15:21:15 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jan 2017 22:21:15 +0000
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
	<4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
Message-ID: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>

Yup I agree.  Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used?

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Date: Friday, January 27, 2017 at 10:23 AM
To: Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] transcript assembly of RNA-seq data

(1) De novo assembly without mapping to any genome assembly (like Trinity)

You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.

?Carson


On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:

Hello:

I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly.
(1) De novo assembly without mapping to any genome assembly (like Trinity)?
(2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
(3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?

Thanks

Best
Quanwei

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/ee2911fc/attachment-0001.html>

From carsonhh at gmail.com  Fri Jan 27 17:53:10 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 27 Jan 2017 17:53:10 -0700
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
	<4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
	<90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>
Message-ID: <DA117F8A-20D0-4F99-96E5-CFF4FDAB1799@gmail.com>

No. My experience has just been with regular Trinity de novo assembly. Of course, I?d be interested in any one else?s attempt at this though.

?Carson


> On Jan 27, 2017, at 3:21 PM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> 
> Yup I agree.  Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used?  
> 
> chris
> 
> From: maker-devel <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Date: Friday, January 27, 2017 at 10:23 AM
> To: Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>>
> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
> Subject: Re: [maker-devel] transcript assembly of RNA-seq data
> 
>> (1) De novo assembly without mapping to any genome assembly (like Trinity)
>> 
>> You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.
>> 
>> ?Carson
>> 
>> 
>>> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>>> 
>>> Hello: 
>>> 
>>> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. 
>>> (1) De novo assembly without mapping to any genome assembly (like Trinity)?
>>> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
>>> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?
>>> 
>>> Thanks
>>> 
>>> Best
>>> Quanwei
>>>  
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/32d7e3a3/attachment-0001.html>

From carsonhh at gmail.com  Sat Jan 28 13:53:45 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Sat, 28 Jan 2017 13:53:45 -0700
Subject: [maker-devel] Maker-Error when started with OpenMPI
In-Reply-To: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
References: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
Message-ID: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>

Try adding one of the following to your mpiexec command ?>

1. --mca btl ^openib
2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0

One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.

--Carson


> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <rainer.rutka at uni-konstanz.de> wrote:
> 
> Hi everybody.
> 
> My name is Rainer. I am an administrator for our HPC-Systems at our
> university in Konstanz, Baden-Wuertemberg/Germany.
> The procect is called bwHPC-C5.
> 
> See: https://www.bwhpc-c5.de/en/index.php
> 
> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
> i get errors while running a Maker job in the MPI-environment.
> 
> BUILD STATUS
> 
> ==============================================================================
> STATUS MAKER v2.31.9
> ==============================================================================
> PERL Dependencies: VERIFIED
> External Programs: VERIFIED
> External C Libraries: VERIFIED
> MPI SUPPORT: ENABLED
> MWAS Web Interface: DISABLED
> MAKER PACKAGE: CONFIGURATION OK
> 
> MODULES / INCLUDES / COMPILERS
> 
> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
> #
> ##### (B) Dependencies:
> #
> # conflict: any other maker version
> # module load compiler/gnu/5.2
> # module load mpi/openmpi/2.0-gnu-5.2
> [...]
> 
> MPI/MOAB SUBMIT
> 
> [...]
> ### Queues ###
> #MSUB -q fat
> #MSUB -l nodes=1:ppn=16
> #MSUB -l mem=20gb
> #MSUB -l walltime=50:00:00
> #
> [...]
> echo " "
> echo "### Loading MAKER module:"
> echo " "
> module load bio/maker/2.31.9
> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
> echo "MAKER_VERSION = $MAKER_VERSION"
> module list
> [...]
> echo " "
> echo "### Runing Maker example"
> echo " "
> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
> export OMPI_MCA_mpi_warn_on_fork=0
> 
> echo "LD_PRELOAD=${LD_PRELOAD}"
> #
> # "STATUS: Processing and indexing input FASTA files..."
> #
> mpiexec -mca btl ^openib -n 16 maker
> [...]
> 
> 
> E R R O R S
> =======
> [...]
> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [uc1n338:113607] *** Process received signal ***
> [uc1n338:113607] Signal: Segmentation fault (11)
> [uc1n338:113607] Signal code: Address not mapped (1)
> [uc1n338:113607] Failing at address: 0x4b0
> [uc1n338:113608] *** Process received signal ***
> [uc1n338:113608] Signal: Segmentation fault (11)
> [uc1n338:113608] Signal code: Address not mapped (1)
> [uc1n338:113608] Failing at address: 0x4b0
> [uc1n338:113621] *** Process received signal ***
> [uc1n338:113621] Signal: Segmentation fault (11)
> [uc1n338:113621] Signal code: Address not mapped (1)
> [uc1n338:113621] Failing at address: 0x4b0
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> [...]
> 
> WHATS WRONG HERE!?
> 
> Thank you for your help!
> 
> All the best ,
> 
> Rainer
> 
> -- 
> Rainer Rutka
> University of Konstanz
> Communication, Information, Media Centre (KIM)
> * High-Performance-Computing (HPC)
> * KIM-Support and -Base-Services
> Room: V511
> 78457 Konstanz, Germany
> +49 7531 88-5413
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From rainer.rutka at uni-konstanz.de  Mon Jan 30 01:32:08 2017
From: rainer.rutka at uni-konstanz.de (Rainer Rutka)
Date: Mon, 30 Jan 2017 09:32:08 +0100
Subject: [maker-devel] Maker-Error when started with OpenMPI
In-Reply-To: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>
References: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
	<73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>
Message-ID: <c89c60e5-1162-1297-5d71-99b1cbf315ec@uni-konstanz.de>

Hi Carson!

Thank you VERY MUCH for your hints.

Much appreciated!

I'll test these today and let you know about the results.

Again: THANKS! :-)

BTW: I'm not a scientist. Only a system operator.

:-)

Am 28.01.2017 um 21:53 schrieb Carson Holt:
> Try adding one of the following to your mpiexec command ?>
> 1. --mca btl ^openib
> 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
> 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
> One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.
> --Carson

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170130/8192bed4/attachment-0001.p7s>

From qwzhang0601 at gmail.com  Tue Jan 31 10:36:13 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 12:36:13 -0500
Subject: [maker-devel] collecting protein sequences as evidences
Message-ID: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>

I wonder what's the best way to collect protein sequences for gene
annotation of a de novo genome assembly.
(1) My first choice is to get protein sequences of human and mouse from
UniProt. At this step, I am not clear whether I should download the
reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e.,
TrEMBL).
(2) On ther other hand, I also get protein sequences from NCBI, should I
just simply merge those fasta files. Does it matter if there are
redundancies? And also, if I get protein sequences from different sources,
they may not have the same quality. Do I need to do something before I
integrate protein sequences from different sources?

Many thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/315d4a00/attachment-0001.html>

From qwzhang0601 at gmail.com  Tue Jan 31 12:08:21 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 14:08:21 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from different
	tissues and individuals
Message-ID: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>

Hello:

I am trying to assemble transcripts using RNA-seq data by the tool Trinity,
which will be used for gene annotation for Maker. Now I have data from two
tissues with two replicates each. Should I merge all four samples to get
one assembly file? Or should I merge replicates of each tissue separately
and use the two assembly files as input of Maker. Merging all samples into
one, we will have much higher coverage level, but I think there may be some
genes expressed by tissue-specific isoforms. So I not sure whether I should
merge RNA-seq from different tissues.
What's more, I find some published RNA-seq data from another individual
(and also for different tissue from us) for the same species. Should I
merge all RNA-seq together (across individuals and tissues)? Or should I
generate different transcript assembly and use all those assemblies as
input to Maker?

Thanks
Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/66a95fb5/attachment-0001.html>

From michael.s.campbell1 at gmail.com  Tue Jan 31 12:26:29 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Tue, 31 Jan 2017 14:26:29 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from
	different tissues and individuals
In-Reply-To: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
Message-ID: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>

I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.

Example: 
est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta

Good luck,
Mike

> On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
> What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
> 
> Thanks
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From michael.s.campbell1 at gmail.com  Tue Jan 31 13:57:28 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Tue, 31 Jan 2017 15:57:28 -0500
Subject: [maker-devel] collecting protein sequences as evidences
In-Reply-To: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>
References: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>
Message-ID: <2E4D90C9-6D6E-4F52-A361-AFB06A61D2C2@gmail.com>

Hi Quanwei,

(1) When I use uniprot I use SWISS-prot and not tremble.
(2) I don?t merge files together. I just pass them all to MAKER as a comma separated list.

Thanks,
Mike

> On Jan 31, 2017, at 12:36 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. 
> (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). 
> (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? 
> 
> Many thanks
> 
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From cjfields at illinois.edu  Tue Jan 31 14:05:43 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 21:05:43 +0000
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
Message-ID: <A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>

I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.

chris

On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org on behalf of michael.s.campbell1 at gmail.com> wrote:

    I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.
    
    Example: 
    est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
    
    Good luck,
    Mike
    
    > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
    > 
    > Hello:
    > 
    > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
    > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
    > 
    > Thanks
    > Best
    > Quanwei
    > _______________________________________________
    > maker-devel mailing list
    > maker-devel at box290.bluehost.com
    > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
    
    
    _______________________________________________
    maker-devel mailing list
    maker-devel at box290.bluehost.com
    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
    

From mjfi2sb3 at gmail.com  Tue Jan 31 14:14:14 2017
From: mjfi2sb3 at gmail.com (Salim Bougouffa)
Date: Tue, 31 Jan 2017 21:14:14 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
	<357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
Message-ID: <CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>

Hi Christopher,

How would you identify a low confidence transcript? And how do you remove
them? Also, did you try setting a minimum read coverage in Trinity as the
default is one?

Best
/SB

On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, <cjfields at illinois.edu>
wrote:

> If I recall, from a BAM you would need to run a reference-based assembly
> on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use
> Trinity for ref-based assembly.  But I always choose the route of a full de
> novo assembly (again, Trinity or similar) when possible, doing some basic
> cleanup (e.g. remove low confidence transcripts) and bring them as EST
> evidence.
>
> chris
>
> From: maker-devel <maker-devel-bounces at yandell-lab.org> on behalf of
> Scott Cain <scott at scottcain.net>
> Date: Wednesday, January 25, 2017 at 2:23 PM
> To: Maya Britstein <mayabritstein at gmail.com>
> Cc: "maker-devel at yandell-lab.org List" <maker-devel at yandell-lab.org>, "
> help at gmod.org" <help at gmod.org>
> Subject: Re: [maker-devel] GFF3 file format
>
> Hi Maya,
>
> I'm not sure what MAKER's requirements are in this regard--I'm forwarding
> this to their mailing list.
>
> Scott
>
>
> On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com>
> wrote:
>
> Hi,
>
> I have RNA-seq data, and genomic data that I want to annotate using maker.
>
> From what I understood, I need to genarate a gff3 file format from the
> RNA-seq mapping sequences. I had mapped the RNA sequences to the genome
> using bowtie and tophat. However, I still do not know how to take these
> format and convert them to a gff3 file that I can them use in maker as
> annotation evidence
>
> I saw the wiki page, that did not mention how to make this conversion (
> http://gmod.org/wiki/GFF3
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>
> )
>
> Can you please help me?
>
> Sincerely,
> Maya
>
> ----
> Maya Britstein
> Ph.D candidate
> Laura Steindler's Lab
> Marine Biology Department
> Leon H. Charney School of Marine Sciences
> University of Haifa, Israel
>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> GMOD Coordinator (http://gmod.org/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)
>                    216-392-3087
> Ontario Institute for Cancer Research
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-- 

____________________________
Sent from Inbox Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/b06e01be/attachment-0001.html>

From qwzhang0601 at gmail.com  Tue Jan 31 14:33:12 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 16:33:12 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
Message-ID: <CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>

Thank you guys for your suggestions. So you do not suggest to use RNA-seq
data from another study, even I assemble them separately and then provide
both assemblies into Maker as a comma separated list. The issues you
mentioned do exist, but some people did collect RNA-seq data from different
individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198).
But thank you for your suggestions, I will think about it.

Best
Quanwei

2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu>:

> I agree with Mike.  I also suggest not combining RNA-Seqs from different
> runs (e.g. different studies) even if they are from the same tissue,
> development stage etc. There are many other factors (biological variation,
> sample quality, sequencing chemistry or technology differences, etc) that
> can significantly and negatively impact trx assembly quality.
>
> chris
>
> On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <
> maker-devel-bounces at yandell-lab.org on behalf of
> michael.s.campbell1 at gmail.com> wrote:
>
>     I would probably try merging the replicates but not the tissues. You
> can then pass the output files to MAKER in a comma separated list in the
> opts file.
>
>     Example:
>     est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
>
>     Good luck,
>     Mike
>
>     > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com>
> wrote:
>     >
>     > Hello:
>     >
>     > I am trying to assemble transcripts using RNA-seq data by the tool
> Trinity, which will be used for gene annotation for Maker. Now I have data
> from two tissues with two replicates each. Should I merge all four samples
> to get one assembly file? Or should I merge replicates of each tissue
> separately and use the two assembly files as input of Maker. Merging all
> samples into one, we will have much higher coverage level, but I think
> there may be some genes expressed by tissue-specific isoforms. So I not
> sure whether I should merge RNA-seq from different tissues.
>     > What's more, I find some published RNA-seq data from another
> individual (and also for different tissue from us) for the same species.
> Should I merge all RNA-seq together (across individuals and tissues)? Or
> should I generate different transcript assembly and use all those
> assemblies as input to Maker?
>     >
>     > Thanks
>     > Best
>     > Quanwei
>     > _______________________________________________
>     > maker-devel mailing list
>     > maker-devel at box290.bluehost.com
>     > http://box290.bluehost.com/mailman/listinfo/maker-devel_
> yandell-lab.org
>
>
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at box290.bluehost.com
>     http://box290.bluehost.com/mailman/listinfo/maker-devel_
> yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/e3c2dca5/attachment-0001.html>

From carsonhh at gmail.com  Tue Jan 31 14:35:20 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 31 Jan 2017 14:35:20 -0700
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
	<CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
Message-ID: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>

I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list.

?Carson
 
> On Jan 31, 2017, at 2:33 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it.
> 
> Best
> Quanwei 
> 
> 2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu <mailto:cjfields at illinois.edu>>:
> I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.
> 
> chris
> 
> On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org> on behalf of michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>> wrote:
> 
>     I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.
> 
>     Example:
>     est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
> 
>     Good luck,
>     Mike
> 
>     > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>     >
>     > Hello:
>     >
>     > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
>     > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
>     >
>     > Thanks
>     > Best
>     > Quanwei
>     > _______________________________________________
>     > maker-devel mailing list
>     > maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>     > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>     http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/770b0474/attachment-0001.html>

From cjfields at illinois.edu  Tue Jan 31 16:05:43 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 23:05:43 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
	<357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
	<CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>
Message-ID: <8BD384C9-4E46-42AC-A59F-96299EF5E104@illinois.edu>

You can use RSEM for some initial filtering:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts

Then I generally use the Trinity QA steps, in particular TransRate or DETONATE:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment

chris

From: Salim Bougouffa <mjfi2sb3 at gmail.com>
Date: Tuesday, January 31, 2017 at 3:14 PM
To: Chris Fields <cjfields at illinois.edu>, Scott Cain <scott at scottcain.net>, Maya Britstein <mayabritstein at gmail.com>
Cc: "maker-devel at yandell-lab.org List" <maker-devel at yandell-lab.org>, "help at gmod.org" <help at gmod.org>
Subject: Re: [maker-devel] GFF3 file format


Hi Christopher,

How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one?

Best
/SB

On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly.  But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence.

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Scott Cain <scott at scottcain.net<mailto:scott at scottcain.net>>
Date: Wednesday, January 25, 2017 at 2:23 PM
To: Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, "help at gmod.org<mailto:help at gmod.org>" <help at gmod.org<mailto:help at gmod.org>>
Subject: Re: [maker-devel] GFF3 file format

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:
Hi,

I have RNA-seq data, and genomic data that I want to annotate using maker.

From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence

I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>)

Can you please help me?

Sincerely,
Maya

----
Maya Britstein
Ph.D candidate
Laura Steindler's Lab
Marine Biology Department
Leon H. Charney School of Marine Sciences
University of Haifa, Israel


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)                     216-392-3087
Ontario Institute for Cancer Research
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=nDcMZi8LwiKXV-94ieW6tG0bEcaZof1aYjvJpMNjDME&s=kb8B_j9O5us3LoI3siiGDenax1ptk_GUX1LqjlB0S4U&e=>
--

____________________________
Sent from Inbox Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/25e09f2b/attachment-0001.html>

From cjfields at illinois.edu  Tue Jan 31 16:07:44 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 23:07:44 +0000
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
	<CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
	<656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>
Message-ID: <CAE4C80D-DD8F-4A2C-A33B-535456D233AE@illinois.edu>

Exactly

chris

From: Carson Holt <carsonhh at gmail.com>
Date: Tuesday, January 31, 2017 at 3:35 PM
To: Quanwei Zhang <qwzhang0601 at gmail.com>
Cc: Chris Fields <cjfields at illinois.edu>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals

I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list.

?Carson

On Jan 31, 2017, at 2:33 PM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:

Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it.
Best
Quanwei

2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>:
I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.

chris

On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org> on behalf of michael.s.campbell1 at gmail.com<mailto:michael.s.campbell1 at gmail.com>> wrote:

    I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.

    Example:
    est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta

    Good luck,
    Mike

    > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:
    >
    > Hello:
    >
    > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
    > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
    >
    > Thanks
    > Best
    > Quanwei
    > _______________________________________________
    > maker-devel mailing list
    > maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
    > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=TbJJJYslHBwbE9FJ8HEVNbD0E2OzS2-euK0lFIQMDgI&s=z4OxAc1Ttw7Hvqdr-PWOdGLMmQpnjXTOXfv-mUkoCJg&e=>


    _______________________________________________
    maker-devel mailing list
    maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=TbJJJYslHBwbE9FJ8HEVNbD0E2OzS2-euK0lFIQMDgI&s=z4OxAc1Ttw7Hvqdr-PWOdGLMmQpnjXTOXfv-mUkoCJg&e=>


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/c78c9df7/attachment-0001.html>

From rob.syme at gmail.com  Wed Jan  4 23:41:25 2017
From: rob.syme at gmail.com (Rob Syme)
Date: Thu, 05 Jan 2017 06:41:25 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
Message-ID: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>

Hi all

The MAKER wiki page "Repeat Library Construction - Advanced
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>"
describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded
MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there.
Are they distributed with MAKER or separately. Does anybody know where to
find them?

Thanks!

Rob Syme
Research Associate
Curtin University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/299fabc3/attachment-0002.html>

From olegl at volcani.agri.gov.il  Thu Jan  5 03:07:31 2017
From: olegl at volcani.agri.gov.il (Oleg Lovky)
Date: Thu, 5 Jan 2017 10:07:31 +0000
Subject: [maker-devel] Unable to train SNAP
Message-ID: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>

Hello,

I'm running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000).
However, I'm not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file.
Furthermore, when trying to use maker2zff I'm getting empty genome.ann and genome.dna files.

Please advise.

Regards,

Oleg Lovky, MSc.
Research Engineer
Institute of Plant Sciences
ARO, Volcani Center
Cell: 054-4870319
[v95_15]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/90c174a8/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 16191 bytes
Desc: image001.png
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/90c174a8/attachment-0002.png>

From michael.s.campbell1 at gmail.com  Thu Jan  5 07:54:17 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Thu, 5 Jan 2017 09:54:17 -0500
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <3B3F80CA-BFA1-4F0E-A2F1-CA60E8496D5F@gmail.com>

Hi Rob,

There is a link near the bottom of that wiki page at the end of this line

"CRL and other custom scripts are available here.?

That points to this URL http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz <http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz>

Thanks,
Mike
> On Jan 5, 2017, at 1:41 AM, Rob Syme <rob.syme at gmail.com> wrote:
> 
> Hi all
> 
> The MAKER wiki page "Repeat Library Construction - Advanced <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them?
> 
> Thanks!
> 
> Rob Syme
> Research Associate
> Curtin University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/220eecad/attachment-0002.html>

From rob.syme at gmail.com  Thu Jan  5 18:29:35 2017
From: rob.syme at gmail.com (Rob Syme)
Date: Fri, 06 Jan 2017 01:29:35 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <CAEf4xgcZXf18ZWD9JusvrsyUdLCg_wOe2SuA2d91mnTcug+u1w@mail.gmail.com>

Oh dear. That's embarrassing for me! Sorry for the silly question.

-r

On Thu, 5 Jan 2017 at 14:41 Rob Syme <rob.syme at gmail.com> wrote:

> Hi all
>
> The MAKER wiki page "Repeat Library Construction - Advanced
> <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>"
> describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded
> MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there.
> Are they distributed with MAKER or separately. Does anybody know where to
> find them?
>
> Thanks!
>
> Rob Syme
> Research Associate
> Curtin University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/ad9453c6/attachment-0002.html>

From xvazquezc at gmail.com  Thu Jan  5 19:23:17 2017
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Fri, 6 Jan 2017 13:23:17 +1100
Subject: [maker-devel] Unable to train SNAP
In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
Message-ID: <CAL0hg4EEQk5CWkrni6-o29m_mAOkYjLKqjA8Df04FKJMbfDB8g@mail.gmail.com>

Are you using the -n option with maker2zff? You often get empty genome.ann
and genome.dna files if you don't.

On 5 January 2017 at 21:07, Oleg Lovky <olegl at volcani.agri.gov.il> wrote:

> Hello,
>
>
>
> I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1)
> containing ~50k reads (length ranges from 70 to 12000).
>
> However, I?m not getting transcript and proteins fasta files at all,
> despite Maker not giving any errors and everything is listed as finished in
> the datastore log file.
>
> Furthermore, when trying to use maker2zff I?m getting empty genome.ann and
> genome.dna files.
>
>
>
> Please advise.
>
>
>
> Regards,
>
>
>
> Oleg Lovky, MSc.
>
> Research Engineer
>
> Institute of Plant Sciences
>
> ARO, Volcani Center
>
> Cell: 054-4870319
>
> [image: v95_15]
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/7dceb9af/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 16191 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/7dceb9af/attachment-0002.png>

From carsonhh at gmail.com  Fri Jan  6 12:28:02 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 6 Jan 2017 12:28:02 -0700
Subject: [maker-devel] Unable to train SNAP
In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
Message-ID: <8F65E561-7450-4B5A-8F1B-4E51C0D25BE2@gmail.com>

The maker2zff script has a number of thresholds that must be reached to avoid filtering all models. If you don?t have protein evidence in the dataset for example, then that filter may always be failing. You may just want to turn all filters off with the -n option as previously suggested.

?Carson


> On Jan 5, 2017, at 3:07 AM, Oleg Lovky <olegl at volcani.agri.gov.il> wrote:
> 
> Hello,
>  
> I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000).
> However, I?m not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file.
> Furthermore, when trying to use maker2zff I?m getting empty genome.ann and genome.dna files.
>  
> Please advise.
>  
> Regards,
>  
> Oleg Lovky, MSc.
> Research Engineer
> Institute of Plant Sciences
> ARO, Volcani Center
> Cell: 054-4870319
> <image001.png>
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/823f4e75/attachment-0002.html>

From kchilds at msu.edu  Thu Jan  5 07:28:00 2017
From: kchilds at msu.edu (Childs, Kevin)
Date: Thu, 5 Jan 2017 14:28:00 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <6AE4044B-9011-4421-A6F1-FE3B95BBB11D@msu.edu>

Rob,

The scripts can be found in a link at the bottom of this wiki page:

http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced

Kevin Childs

---
Kevin Childs, PhD

Assistant Professor - Fixed Term
Center for Genomics-Enabled Plant Science
Plant Biology Department
Michigan State University

kchilds at msu.edu
517-775-2844 (m)
517-884-6926 (o)

http://childslab.plantbiology.msu.edu


> On Jan 5, 2017, at 1:41 AM, Rob Syme <rob.syme at gmail.com> wrote:
> 
> Hi all
> 
> The MAKER wiki page "Repeat Library Construction - Advanced" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them?
> 
> Thanks!
> 
> Rob Syme
> Research Associate
> Curtin University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Fri Jan  6 18:22:10 2017
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Fri, 6 Jan 2017 20:22:10 -0500
Subject: [maker-devel] /tmp full
Message-ID: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>

Hi all,

Maker keeps filling up the /tmp directories on the cluster I am using. It
appears that most of the space is taken with many versions of various blast
databases. I suspect that this issue is partly due to my not using MPI and
instead launching multiple instances of maker (typically 16) in the same
working directory. However, it appears that maker is also leaving some of
these databases in /tmp even after it has died or been killed and they are
piling up.

I am submitting my jobs to the cluster via SLURM but have installed maker
locally rather than system-wide. My system administrator is going to try
creating a larger locally mounted directory on some of the nodes for me but
I wanted to check to see if you have any other suggestions to solve the
issue or make sure that maker cleans up /tmp as aggressively as possible.

I am using maker3-beta.

Thanks for any help,
Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/3fb552ff/attachment-0002.html>

From carsonhh at gmail.com  Sat Jan  7 16:29:29 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Sat, 7 Jan 2017 16:29:29 -0700
Subject: [maker-devel] /tmp full
In-Reply-To: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
References: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
Message-ID: <DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>

If you use the MPI settings, then all processes will share a single temporary directory, otherwise they each will have a separate one since they can?t intercommunicate.

MAKER tries to cleanup its files on finish or failure, but if you or the system kill it with certain signals, then it is reaped immediately by the system and not allowed to finish cleaning up. Signals 9 and 19 for example will do that. If a failure is related to the drive being full or a memory issue, then your system may be hitting it with one of these uncatchable signals. For example SLURM may use signal 9 or 19 if a process fails to respond to signal 15 in a timely manner (i.e. MAKER may be removing files, but SLURM gets impatient and kills it more aggressively because it thinks the process is not responding). You can always try and empty /tmp as the first step in your batch script, and it will remove files belonging to you before launching MAKER.

?Carson


> On Jan 6, 2017, at 6:22 PM, Benjamin Rubin <brubin at fieldmuseum.org> wrote:
> 
> Hi all,
> 
> Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. 
> 
> I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible.
> 
> I am using maker3-beta.
> 
> Thanks for any help,
> Ben
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Sun Jan  8 09:24:36 2017
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Sun, 8 Jan 2017 11:24:36 -0500
Subject: [maker-devel] /tmp full
In-Reply-To: <DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>
References: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
	<DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>
Message-ID: <CAKpVPBLfiYakZ3Ce2q02gYXatJHJzJ8dW-YMgscg9Nm6-KT03w@mail.gmail.com>

OK, thanks for the tips. Knowing the particulars of how SLURM might be
causing this is extremely helpful. I'll try to just empty /tmp before
running MAKER on each node, as you suggest. I suspect that will work but
will work on getting MPI running as well.

Thanks!
Ben

On Sat, Jan 7, 2017 at 6:29 PM, Carson Holt <carsonhh at gmail.com> wrote:

> If you use the MPI settings, then all processes will share a single
> temporary directory, otherwise they each will have a separate one since
> they can?t intercommunicate.
>
> MAKER tries to cleanup its files on finish or failure, but if you or the
> system kill it with certain signals, then it is reaped immediately by the
> system and not allowed to finish cleaning up. Signals 9 and 19 for example
> will do that. If a failure is related to the drive being full or a memory
> issue, then your system may be hitting it with one of these uncatchable
> signals. For example SLURM may use signal 9 or 19 if a process fails to
> respond to signal 15 in a timely manner (i.e. MAKER may be removing files,
> but SLURM gets impatient and kills it more aggressively because it thinks
> the process is not responding). You can always try and empty /tmp as the
> first step in your batch script, and it will remove files belonging to you
> before launching MAKER.
>
> ?Carson
>
>
>
>
> > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin <brubin at fieldmuseum.org>
> wrote:
> >
> > Hi all,
> >
> > Maker keeps filling up the /tmp directories on the cluster I am using.
> It appears that most of the space is taken with many versions of various
> blast databases. I suspect that this issue is partly due to my not using
> MPI and instead launching multiple instances of maker (typically 16) in the
> same working directory. However, it appears that maker is also leaving some
> of these databases in /tmp even after it has died or been killed and they
> are piling up.
> >
> > I am submitting my jobs to the cluster via SLURM but have installed
> maker locally rather than system-wide. My system administrator is going to
> try creating a larger locally mounted directory on some of the nodes for me
> but I wanted to check to see if you have any other suggestions to solve the
> issue or make sure that maker cleans up /tmp as aggressively as possible.
> >
> > I am using maker3-beta.
> >
> > Thanks for any help,
> > Ben
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
_____________________________________________________
Benjamin ER Rubin, PhD
Committee on Evolutionary Biology
University of Chicago
benrubin.org

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170108/e4efa4cc/attachment-0002.html>

From lmainzer at life.illinois.edu  Mon Jan  9 00:02:01 2017
From: lmainzer at life.illinois.edu (Liudmila Sergeevna Mainzer)
Date: Mon, 9 Jan 2017 01:02:01 -0600
Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names
Message-ID: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>

Hello, MAKER developers!

I tried submitting this bug report through the web form on the 
RepeatMasker web page, but I am getting an "invalid submission" message, 
so I decided to post here.

I found a weird bug that results in the notorious "index out of bounds" 
error reported by RepeatMasker. Significantly, this error only arises on 
very long file names generated by MAKER.

I traced this through the code, and identified the error to originate in 
Tandem Repeat finder. TRF sometimes splits up its output into separate 
files. When that happens, the pieces with index >1 do not contain the 
sequence name. Compare the first few lines between these two files:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html
 
<HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 


     bgcolor="#File 1 of 2 FBF8BC"><PRE>
     Tandem Repeats Finder Program written by:
                   Gary Benson
                   Program in Bioinformatics
                   Boston University
     Version 4.09
     Sequence: InputSequencefrag-1 CHUNK number:191 <http://number:191>
     size:455659 <http://size:455659> offset:57300000
     <http://offset:57300000>
     Parameters: 2 3 5 75 20 33 7

etcetera
But also the second chunk:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
 
<HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 


     bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
     <A NAME="56278--56322,1,45.0,1,1136"></A><A
     HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
     <http://tandem.bu.edu/trf/trf.definitions.html#alignment> target
     ="explanation">Alignment explanation</A><BR><BR>
        Indices: 56278--56322  Score: 55
        Period size: 1  Copynumber: 45.0  Consensus size: 1

etcetera


See how one file has the full header with the "Sequence:" statement and 
the other one does not? This "Sequence:" statement is used in the 
RepeatMasker code to name each piece of sequence that ends up being 
masked later. When this variable if empty (the name string is not 
defined), the setSubstr subroutine in the main RepeatMasker code breaks: 
length of an undefined string is of course zero, and that subroutine has 
a check for sequences whose length is shorter than the region that needs 
to be masked.

So it quits with the statement "Error index out of bounds!", even though 
the sequence is finite length, does not have any weird characters, and 
is maskable.

Once again, this only arises on very long file names, and those seem to 
be created by MAKER. Example:
LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific

Notice how the last part of the file name has a bunch of identifiers 
separated by the %2E (generic URI-encoding)? I experimented with that 
file name. The path does not matter. The % signs do not matter. It is 
the length of the filename itself: if it is <108 characters, then 
RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like 
maybe Perl is not handling that long a name very well...

So the problem is three-fold: MAKER creates file names that are 
very-very long, while RepeatMasker breaks due to TRF failing to write 
the file headers properly for those very long file names.

Would you provide any suggestions or patches for this problem? It is 
forcing us to run RepeatMasker separately, outside the main MAKER 
worlflow, which really complicates the data management and analysis as a 
whole.
We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl 
v5.10.1 built for x86_64-linux-thread-multi.

Many thanks in advance,
Liudmila Mainzer

----------------
Senior Research Scientist
National Center for Supercomputing Applications

Research Assistant Professor
Institute of Genomic Biology

University of Illinois
217-300-0568
1205 W. Clark St. Room 4026
Urbana, IL 61801


From carsonhh at gmail.com  Mon Jan  9 09:30:09 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 9 Jan 2017 09:30:09 -0700
Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names
In-Reply-To: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>
References: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>
Message-ID: <733D5263-6CFC-4AB3-BFDD-30330B0E1985@gmail.com>

The name used by maker is based off of the input file name, so quick fix would just be to rename your input file to have a shorter name.

?Carson


> On Jan 9, 2017, at 12:02 AM, Liudmila Sergeevna Mainzer <lmainzer at life.illinois.edu> wrote:
> 
> Hello, MAKER developers!
> 
> I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here.
> 
> I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER.
> 
> I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html
> <HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 
> 
>    bgcolor="#File 1 of 2 FBF8BC"><PRE>
>    Tandem Repeats Finder Program written by:
>                  Gary Benson
>                  Program in Bioinformatics
>                  Boston University
>    Version 4.09
>    Sequence: InputSequencefrag-1 CHUNK number:191 <http://number:191>
>    size:455659 <http://size:455659> offset:57300000
>    <http://offset:57300000>
>    Parameters: 2 3 5 75 20 33 7
> 
> etcetera
> But also the second chunk:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
> <HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 
> 
>    bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
>    <A NAME="56278--56322,1,45.0,1,1136"></A><A
>    HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
>    <http://tandem.bu.edu/trf/trf.definitions.html#alignment> target
>    ="explanation">Alignment explanation</A><BR><BR>
>       Indices: 56278--56322  Score: 55
>       Period size: 1  Copynumber: 45.0  Consensus size: 1
> 
> etcetera
> 
> 
> See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked.
> 
> So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable.
> 
> Once again, this only arises on very long file names, and those seem to be created by MAKER. Example:
> LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific
> 
> Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well...
> 
> So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names.
> 
> Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole.
> We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi.
> 
> Many thanks in advance,
> Liudmila Mainzer
> 
> ----------------
> Senior Research Scientist
> National Center for Supercomputing Applications
> 
> Research Assistant Professor
> Institute of Genomic Biology
> 
> University of Illinois
> 217-300-0568
> 1205 W. Clark St. Room 4026
> Urbana, IL 61801
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From qlian003 at ucr.edu  Wed Jan 11 22:28:32 2017
From: qlian003 at ucr.edu (Qihua Liang)
Date: Wed, 11 Jan 2017 21:28:32 -0800
Subject: [maker-devel] gff file: possible sources
Message-ID: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>

Hi Maker develop team!

I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as,

Possible Sources Include:
BLASTN - BLASTN alignment of EST evidence
BLASTX - BLASTX alignment of protein evidence
TBLASTX - TBLASTX alignment of EST evidence from closely related organisms
EST2Genome - Polished EST alignment from Exonerate
Protein2Genome - Polished protein alignment from Exonerate
SNAP - SNAP ab inito gene prediction
GENEMARK - GeneMarkab inito gene prediction
Augustus - Augustus ab inito gene prediction
FgenesH - FGENESH ab inito gene prediction
Repeatmasker - RepeatMasker identified repeat
RepeatRunner - RepeatRunner identified repeat from the repeat protein database
tRNAScan - tRNAScan-SE tRNA predictions (coming soon)
PASA - PASA gene predictions (coming soon)

There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above?

Thanks
Qihua

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170111/bc9a45df/attachment-0002.html>

From dence at genetics.utah.edu  Thu Jan 12 06:28:24 2017
From: dence at genetics.utah.edu (Daniel Ence)
Date: Thu, 12 Jan 2017 13:28:24 +0000
Subject: [maker-devel] gff file: possible sources
In-Reply-To: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>
References: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>
Message-ID: <DE48F3CB-8B72-43A6-8331-ED1B811CDCCE@genetics.utah.edu>

Hi Qihua, the cdna2genome is the polished tblastx alignments from Exonerate. Basically, the source column should be the name of the tool that generated the alignment, prediction, or gene model.

~Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

On Jan 11, 2017, at 11:28 PM, Qihua Liang <qlian003 at ucr.edu<mailto:qlian003 at ucr.edu>> wrote:

Hi Maker develop team!

I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as,

Possible Sources Include:

  *   BLASTN - BLASTN alignment of EST evidence
  *   BLASTX - BLASTX alignment of protein evidence
  *   TBLASTX - TBLASTX alignment of EST evidence from closely related organisms
  *   EST2Genome - Polished EST alignment from Exonerate
  *   Protein2Genome - Polished protein alignment from Exonerate
  *   SNAP - SNAP ab inito gene prediction
  *   GENEMARK - GeneMarkab inito gene prediction
  *   Augustus - Augustus ab inito gene prediction
  *   FgenesH - FGENESH ab inito gene prediction
  *   Repeatmasker - RepeatMasker identified repeat
  *   RepeatRunner - RepeatRunner identified repeat from the repeat protein database
  *   tRNAScan - tRNAScan-SE tRNA predictions (coming soon)
  *   PASA - PASA gene predictions (coming soon)

There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above?

Thanks
Qihua

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170112/86ed58bb/attachment-0002.html>

From patel.kumar.vipul at gmail.com  Fri Jan 20 01:44:26 2017
From: patel.kumar.vipul at gmail.com (Vipul Patel)
Date: Fri, 20 Jan 2017 09:44:26 +0100
Subject: [maker-devel] Maker crash for long chrm.
Message-ID: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>

Hi,

I hope someone can help me to figure out what is actually going wrong.

I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP
variable not to use NFS. The given testcase as well for 1k<small contigs <
1MB runs without any problems.

Applying it to a sequence, for example with 57MB it failes, I tried it as
well with a different sequences around 60MB, same outcome.

I looked into the logs, but it was not really helpful as it was just stated
that the job failed

It crashed with following message:

deleted:0 genes
substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Calling translate without a seq argument!
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447
STACK: Bio::Tools::CodonTable::translate
/usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419
STACK: CGL::TranslationMachine::longest_translation_plus_stop
programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
STACK: maker::auto_annotator::get_translation_seq
programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236
STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/
snap.pm:974
STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/
snap.pm:690
STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
STACK: Process::MpiChunk::_go
programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469
STACK: Process::MpiChunk::run
programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341
STACK: programs/maker/maker/bin/maker:979
-----------------------------------------------------------
--> rank=16, hostname=dummy
ERROR: Failed while gathering ab-init output files
ERROR: Chunk failed at level:1, tier_type:2
FAILED CONTIG:chr_test

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:chr_test

examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...

I got the same message if I run it without MPI, So I can guess it is not an
MPI issue.
How can I find out if some jobs died so maybe this could lead to this
problem?
Other ideas how I can tackle this problem?

Kind regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/d0c6f874/attachment-0002.html>

From patel.kumar.vipul at gmail.com  Fri Jan 20 06:34:28 2017
From: patel.kumar.vipul at gmail.com (Vipul Patel)
Date: Fri, 20 Jan 2017 14:34:28 +0100
Subject: [maker-devel] Maker crash for long chrm.
In-Reply-To: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
References: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
Message-ID: <CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>

Solved. After some digging and printing I found out the problem.

It was snap itself!

For anybody who maybe runs in the  same problem, check snap. Apparently it
was not correctly compiled and therefore it produced a not conform output!
Recompiling solved my issue.

Kind regards

2017-01-20 9:44 GMT+01:00 Vipul Patel <patel.kumar.vipul at gmail.com>:

> Hi,
>
> I hope someone can help me to figure out what is actually going wrong.
>
> I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP
> variable not to use NFS. The given testcase as well for 1k<small contigs <
> 1MB runs without any problems.
>
> Applying it to a sequence, for example with 57MB it failes, I tried it as
> well with a different sequences around 60MB, same outcome.
>
> I looked into the logs, but it was not really helpful as it was just
> stated that the job failed
>
> It crashed with following message:
>
> deleted:0 genes
> substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Calling translate without a seq argument!
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/
> Bio/Root/Root.pm:447
> STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/
> Bio/Tools/CodonTable.pm:419
> STACK: CGL::TranslationMachine::longest_translation_plus_stop
> programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
> STACK: maker::auto_annotator::get_translation_seq
> programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236
> STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../
> lib/Widget/snap.pm:974
> STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/
> snap.pm:690
> STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
> STACK: Process::MpiChunk::_go programs/maker/maker/bin/../
> lib/Process/MpiChunk.pm:1469
> STACK: Process::MpiChunk::run programs/maker/maker/bin/../
> lib/Process/MpiChunk.pm:341
> STACK: programs/maker/maker/bin/maker:979
> -----------------------------------------------------------
> --> rank=16, hostname=dummy
> ERROR: Failed while gathering ab-init output files
> ERROR: Chunk failed at level:1, tier_type:2
> FAILED CONTIG:chr_test
>
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:chr_test
>
> examining contents of the fasta file and run log
>
>
>
> --Next Contig--
>
> Processing run.log file...
>
> I got the same message if I run it without MPI, So I can guess it is not
> an MPI issue.
> How can I find out if some jobs died so maybe this could lead to this
> problem?
> Other ideas how I can tackle this problem?
>
> Kind regards
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/9e92c0fc/attachment-0002.html>

From carsonhh at gmail.com  Fri Jan 20 15:00:49 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 20 Jan 2017 15:00:49 -0700
Subject: [maker-devel] Maker crash for long chrm.
In-Reply-To: <CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>
References: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
	<CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>
Message-ID: <59841676-741F-496D-9E47-7750417033A4@gmail.com>

I?m glad it?s working for you. Let us know if anything else comes up.

?Carson

> On Jan 20, 2017, at 6:34 AM, Vipul Patel <patel.kumar.vipul at gmail.com> wrote:
> 
> Solved. After some digging and printing I found out the problem.
> 
> It was snap itself!
> 
> For anybody who maybe runs in the  same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. 
> 
> Kind regards
> 
> 2017-01-20 9:44 GMT+01:00 Vipul Patel <patel.kumar.vipul at gmail.com <mailto:patel.kumar.vipul at gmail.com>>:
> Hi,
> 
> I hope someone can help me to figure out what is actually going wrong. 
> 
> I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k<small contigs < 1MB runs without any problems. 
> 
> Applying it to a sequence, for example with 57MB it failes, I tried it as well with a different sequences around 60MB, same outcome. 
> 
> I looked into the logs, but it was not really helpful as it was just stated that the job failed
> 
> It crashed with following message:
> 
> deleted:0 genes
> substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Calling translate without a seq argument!
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447
> STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419
> STACK: CGL::TranslationMachine::longest_translation_plus_stop programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
> STACK: maker::auto_annotator::get_translation_seq programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 <http://auto_annotator.pm:3236/>
> STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/snap.pm:974 <http://snap.pm:974/>
> STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/snap.pm:690 <http://snap.pm:690/>
> STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
> STACK: Process::MpiChunk::_go programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469
> STACK: Process::MpiChunk::run programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341
> STACK: programs/maker/maker/bin/maker:979
> -----------------------------------------------------------
> --> rank=16, hostname=dummy
> ERROR: Failed while gathering ab-init output files
> ERROR: Chunk failed at level:1, tier_type:2
> FAILED CONTIG:chr_test
> 
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:chr_test
> 
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> 
> I got the same message if I run it without MPI, So I can guess it is not an MPI issue. 
> How can I find out if some jobs died so maybe this could lead to this problem?
> Other ideas how I can tackle this problem?
> 
> Kind regards
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/c26f37b6/attachment-0002.html>

From mayabritstein at gmail.com  Mon Jan 23 01:30:40 2017
From: mayabritstein at gmail.com (Maya Britstein)
Date: Mon, 23 Jan 2017 10:30:40 +0200
Subject: [maker-devel] Authorization failed.
Message-ID: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>

Hi,

I can't access the maker-devel archives. I am entering my email, and what I
think is my password, but still it doesn't work.

thanks,

Maya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170123/1d817b70/attachment-0002.html>

From bmoore at genetics.utah.edu  Mon Jan 23 05:43:53 2017
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Mon, 23 Jan 2017 12:43:53 +0000
Subject: [maker-devel] Authorization failed.
In-Reply-To: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>
References: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>
Message-ID: <E0148C3A-ACD6-49B2-A39C-C8393D0E9CEA@genetics.utah.edu>

Hi Maya,

If you follow the link below you will find at the bottom of the page a portion of the form that allows you to reset your password.  It?s a little misleading because it looks like it?s only an ?Unsubscribe? option, but it also takes you to a page that allows you to update your subscription details including password reminder/reset.  The actual text for the portion of the page you?re looking for is this:

'To unsubscribe from maker-devel, get a password reminder, or change your subscription options enter your subscription email address:'

The linke is:

http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Regards,

Barry

On Jan 23, 2017, at 1:30 AM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:

Hi,

I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work.

thanks,

Maya
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170123/c4c9f1fb/attachment-0002.html>

From daren.card at gmail.com  Tue Jan 24 07:06:22 2017
From: daren.card at gmail.com (Daren C. Card)
Date: Tue, 24 Jan 2017 08:06:22 -0600
Subject: [maker-devel] Maker error: Invalid nucleotide
Message-ID: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>

Hi everyone,

I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig:

?
Widget::augustus:
/opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus
#-------------------------------#

/opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
	Invalid nucleotide '8' encountered.


/opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
	Invalid nucleotide '8' encountered.

ERROR: Augustus failed
--> rank=7, hostname=moonunit0
ERROR: Failed while preparing ab-inits
ERROR: Chunk failed at level:0, tier_type:2
FAILED CONTIG:scaffold-92

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scaffold-92

examining contents of the fasta file and run log
?

Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues.

Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary.

Thanks,
Daren Card

UT-Arlington


From carsonhh at gmail.com  Wed Jan 25 10:37:50 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 25 Jan 2017 10:37:50 -0700
Subject: [maker-devel] Maker error: Invalid nucleotide
In-Reply-To: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>
References: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>
Message-ID: <5E13AB7E-9175-4440-AD62-A53BD9DD8DE1@gmail.com>

Try running the contig in question (scaffold-92) as a separate MAKER run. That may haelp indicate if the issue may be a corrupt intermediate file (if it is, you can set clean_try=1 to force deletion of intermediate files before rerun).

?Carson


> On Jan 24, 2017, at 7:06 AM, Daren C. Card <daren.card at gmail.com> wrote:
> 
> Hi everyone,
> 
> I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig:
> 
> ?
> Widget::augustus:
> /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus
> #-------------------------------#
> 
> /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
> 	Invalid nucleotide '8' encountered.
> 
> 
> /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
> 	Invalid nucleotide '8' encountered.
> 
> ERROR: Augustus failed
> --> rank=7, hostname=moonunit0
> ERROR: Failed while preparing ab-inits
> ERROR: Chunk failed at level:0, tier_type:2
> FAILED CONTIG:scaffold-92
> 
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:scaffold-92
> 
> examining contents of the fasta file and run log
> ?
> 
> Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues.
> 
> Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary.
> 
> Thanks,
> Daren Card
> 
> UT-Arlington
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From scott at scottcain.net  Wed Jan 25 13:23:02 2017
From: scott at scottcain.net (Scott Cain)
Date: Wed, 25 Jan 2017 15:23:02 -0500
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
Message-ID: <CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding
this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com>
wrote:

> Hi,
>
> I have RNA-seq data, and genomic data that I want to annotate using maker.
>
> From what I understood, I need to genarate a gff3 file format from the
> RNA-seq mapping sequences. I had mapped the RNA sequences to the genome
> using bowtie and tophat. However, I still do not know how to take these
> format and convert them to a gff3 file that I can them use in maker as
> annotation evidence
>
> I saw the wiki page, that did not mention how to make this conversion (
> http://gmod.org/wiki/GFF3)
>
> Can you please help me?
>
> Sincerely,
> Maya
>
> ----
> Maya Britstein
> Ph.D candidate
> Laura Steindler's Lab
> Marine Biology Department
> Leon H. Charney School of Marine Sciences
> University of Haifa, Israel
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170125/272d299a/attachment-0002.html>

From cjfields at illinois.edu  Wed Jan 25 15:03:51 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 25 Jan 2017 22:03:51 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
Message-ID: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>

If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly.  But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence.

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Scott Cain <scott at scottcain.net<mailto:scott at scottcain.net>>
Date: Wednesday, January 25, 2017 at 2:23 PM
To: Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, "help at gmod.org<mailto:help at gmod.org>" <help at gmod.org<mailto:help at gmod.org>>
Subject: Re: [maker-devel] GFF3 file format

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:
Hi,

I have RNA-seq data, and genomic data that I want to annotate using maker.

From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence

I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>)

Can you please help me?

Sincerely,
Maya

----
Maya Britstein
Ph.D candidate
Laura Steindler's Lab
Marine Biology Department
Leon H. Charney School of Marine Sciences
University of Haifa, Israel


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)                     216-392-3087
Ontario Institute for Cancer Research


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170125/57e6cafc/attachment-0002.html>

From qwzhang0601 at gmail.com  Thu Jan 26 13:26:42 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Thu, 26 Jan 2017 15:26:42 -0500
Subject: [maker-devel] canonical protein sequences or isoform?
Message-ID: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>

Hello:

I am doing annotation on a new genome and collecting proteins from mouse. I
found there are both canonical protein sequences and isoforms. I wonder
whether I should use only cannonical protein sequences or both the
canonical and isoforms?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170126/a8b37ec7/attachment-0002.html>

From rainer.rutka at uni-konstanz.de  Fri Jan 27 03:31:40 2017
From: rainer.rutka at uni-konstanz.de (Rainer Rutka)
Date: Fri, 27 Jan 2017 11:31:40 +0100
Subject: [maker-devel] Maker-Error when started with OpenMPI
Message-ID: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>

Hi everybody.

My name is Rainer. I am an administrator for our HPC-Systems at our
university in Konstanz, Baden-Wuertemberg/Germany.
The procect is called bwHPC-C5.

See: https://www.bwhpc-c5.de/en/index.php

I try to get Maker running on our bwUniCluster since weeks. Unfortunately
i get errors while running a Maker job in the MPI-environment.

BUILD STATUS

==============================================================================
STATUS MAKER v2.31.9
==============================================================================
PERL Dependencies: VERIFIED
External Programs: VERIFIED
External C Libraries: VERIFIED
MPI SUPPORT: ENABLED
MWAS Web Interface: DISABLED
MAKER PACKAGE: CONFIGURATION OK

MODULES / INCLUDES / COMPILERS

# knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
#
##### (B) Dependencies:
#
# conflict: any other maker version
# module load compiler/gnu/5.2
# module load mpi/openmpi/2.0-gnu-5.2
[...]

MPI/MOAB SUBMIT

[...]
### Queues ###
#MSUB -q fat
#MSUB -l nodes=1:ppn=16
#MSUB -l mem=20gb
#MSUB -l walltime=50:00:00
#
[...]
echo " "
echo "### Loading MAKER module:"
echo " "
module load bio/maker/2.31.9
[ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 
'bio/maker/2.31.9'."; exit 1; }
echo "MAKER_VERSION = $MAKER_VERSION"
module list
[...]
echo " "
echo "### Runing Maker example"
echo " "
export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
export OMPI_MCA_mpi_warn_on_fork=0

echo "LD_PRELOAD=${LD_PRELOAD}"
#
# "STATUS: Processing and indexing input FASTA files..."
#
mpiexec -mca btl ^openib -n 16 maker
[...]


E R R O R S
=======
[...]
LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[uc1n338:113607] *** Process received signal ***
[uc1n338:113607] Signal: Segmentation fault (11)
[uc1n338:113607] Signal code: Address not mapped (1)
[uc1n338:113607] Failing at address: 0x4b0
[uc1n338:113608] *** Process received signal ***
[uc1n338:113608] Signal: Segmentation fault (11)
[uc1n338:113608] Signal code: Address not mapped (1)
[uc1n338:113608] Failing at address: 0x4b0
[uc1n338:113621] *** Process received signal ***
[uc1n338:113621] Signal: Segmentation fault (11)
[uc1n338:113621] Signal code: Address not mapped (1)
[uc1n338:113621] Failing at address: 0x4b0
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[...]

WHATS WRONG HERE!?

Thank you for your help!

All the best ,

Rainer

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/44fc3eb4/attachment-0002.p7s>

From michael.s.campbell1 at gmail.com  Fri Jan 27 08:36:11 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Fri, 27 Jan 2017 10:36:11 -0500
Subject: [maker-devel] canonical protein sequences or isoform?
In-Reply-To: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>
References: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>
Message-ID: <C9A931ED-273F-4B67-B9C2-32C86166312C@gmail.com>

I give MAKER all isoforms as evidence.

Mike
> On Jan 26, 2017, at 3:26 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms?
> 
> Thanks
> 
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From qwzhang0601 at gmail.com  Fri Jan 27 09:13:22 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Fri, 27 Jan 2017 11:13:22 -0500
Subject: [maker-devel] transcript assembly of RNA-seq data
Message-ID: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>

Hello:

I wonder which is the best way to make use of RNA-seq data for gene
annotation of a new genome assembly.
(1) De novo assembly without mapping to any genome assembly (like Trinity)?
(2) TopHat+Cufflink do mapping to the new genome assembly, that want to
annotate?
(3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or
human)?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/b910c88d/attachment-0002.html>

From carsonhh at gmail.com  Fri Jan 27 09:23:40 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 27 Jan 2017 09:23:40 -0700
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
Message-ID: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>

(1) De novo assembly without mapping to any genome assembly (like Trinity)

You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.

?Carson


> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello: 
> 
> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. 
> (1) De novo assembly without mapping to any genome assembly (like Trinity)?
> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?
> 
> Thanks
> 
> Best
> Quanwei
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/56300e39/attachment-0002.html>

From cjfields at illinois.edu  Fri Jan 27 15:21:15 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jan 2017 22:21:15 +0000
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
	<4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
Message-ID: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>

Yup I agree.  Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used?

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Date: Friday, January 27, 2017 at 10:23 AM
To: Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] transcript assembly of RNA-seq data

(1) De novo assembly without mapping to any genome assembly (like Trinity)

You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.

?Carson


On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:

Hello:

I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly.
(1) De novo assembly without mapping to any genome assembly (like Trinity)?
(2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
(3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?

Thanks

Best
Quanwei

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/ee2911fc/attachment-0002.html>

From carsonhh at gmail.com  Fri Jan 27 17:53:10 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 27 Jan 2017 17:53:10 -0700
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
	<4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
	<90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>
Message-ID: <DA117F8A-20D0-4F99-96E5-CFF4FDAB1799@gmail.com>

No. My experience has just been with regular Trinity de novo assembly. Of course, I?d be interested in any one else?s attempt at this though.

?Carson


> On Jan 27, 2017, at 3:21 PM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> 
> Yup I agree.  Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used?  
> 
> chris
> 
> From: maker-devel <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Date: Friday, January 27, 2017 at 10:23 AM
> To: Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>>
> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
> Subject: Re: [maker-devel] transcript assembly of RNA-seq data
> 
>> (1) De novo assembly without mapping to any genome assembly (like Trinity)
>> 
>> You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.
>> 
>> ?Carson
>> 
>> 
>>> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>>> 
>>> Hello: 
>>> 
>>> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. 
>>> (1) De novo assembly without mapping to any genome assembly (like Trinity)?
>>> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
>>> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?
>>> 
>>> Thanks
>>> 
>>> Best
>>> Quanwei
>>>  
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/32d7e3a3/attachment-0002.html>

From carsonhh at gmail.com  Sat Jan 28 13:53:45 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Sat, 28 Jan 2017 13:53:45 -0700
Subject: [maker-devel] Maker-Error when started with OpenMPI
In-Reply-To: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
References: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
Message-ID: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>

Try adding one of the following to your mpiexec command ?>

1. --mca btl ^openib
2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0

One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.

--Carson


> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <rainer.rutka at uni-konstanz.de> wrote:
> 
> Hi everybody.
> 
> My name is Rainer. I am an administrator for our HPC-Systems at our
> university in Konstanz, Baden-Wuertemberg/Germany.
> The procect is called bwHPC-C5.
> 
> See: https://www.bwhpc-c5.de/en/index.php
> 
> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
> i get errors while running a Maker job in the MPI-environment.
> 
> BUILD STATUS
> 
> ==============================================================================
> STATUS MAKER v2.31.9
> ==============================================================================
> PERL Dependencies: VERIFIED
> External Programs: VERIFIED
> External C Libraries: VERIFIED
> MPI SUPPORT: ENABLED
> MWAS Web Interface: DISABLED
> MAKER PACKAGE: CONFIGURATION OK
> 
> MODULES / INCLUDES / COMPILERS
> 
> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
> #
> ##### (B) Dependencies:
> #
> # conflict: any other maker version
> # module load compiler/gnu/5.2
> # module load mpi/openmpi/2.0-gnu-5.2
> [...]
> 
> MPI/MOAB SUBMIT
> 
> [...]
> ### Queues ###
> #MSUB -q fat
> #MSUB -l nodes=1:ppn=16
> #MSUB -l mem=20gb
> #MSUB -l walltime=50:00:00
> #
> [...]
> echo " "
> echo "### Loading MAKER module:"
> echo " "
> module load bio/maker/2.31.9
> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
> echo "MAKER_VERSION = $MAKER_VERSION"
> module list
> [...]
> echo " "
> echo "### Runing Maker example"
> echo " "
> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
> export OMPI_MCA_mpi_warn_on_fork=0
> 
> echo "LD_PRELOAD=${LD_PRELOAD}"
> #
> # "STATUS: Processing and indexing input FASTA files..."
> #
> mpiexec -mca btl ^openib -n 16 maker
> [...]
> 
> 
> E R R O R S
> =======
> [...]
> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [uc1n338:113607] *** Process received signal ***
> [uc1n338:113607] Signal: Segmentation fault (11)
> [uc1n338:113607] Signal code: Address not mapped (1)
> [uc1n338:113607] Failing at address: 0x4b0
> [uc1n338:113608] *** Process received signal ***
> [uc1n338:113608] Signal: Segmentation fault (11)
> [uc1n338:113608] Signal code: Address not mapped (1)
> [uc1n338:113608] Failing at address: 0x4b0
> [uc1n338:113621] *** Process received signal ***
> [uc1n338:113621] Signal: Segmentation fault (11)
> [uc1n338:113621] Signal code: Address not mapped (1)
> [uc1n338:113621] Failing at address: 0x4b0
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> [...]
> 
> WHATS WRONG HERE!?
> 
> Thank you for your help!
> 
> All the best ,
> 
> Rainer
> 
> -- 
> Rainer Rutka
> University of Konstanz
> Communication, Information, Media Centre (KIM)
> * High-Performance-Computing (HPC)
> * KIM-Support and -Base-Services
> Room: V511
> 78457 Konstanz, Germany
> +49 7531 88-5413
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From rainer.rutka at uni-konstanz.de  Mon Jan 30 01:32:08 2017
From: rainer.rutka at uni-konstanz.de (Rainer Rutka)
Date: Mon, 30 Jan 2017 09:32:08 +0100
Subject: [maker-devel] Maker-Error when started with OpenMPI
In-Reply-To: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>
References: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
	<73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>
Message-ID: <c89c60e5-1162-1297-5d71-99b1cbf315ec@uni-konstanz.de>

Hi Carson!

Thank you VERY MUCH for your hints.

Much appreciated!

I'll test these today and let you know about the results.

Again: THANKS! :-)

BTW: I'm not a scientist. Only a system operator.

:-)

Am 28.01.2017 um 21:53 schrieb Carson Holt:
> Try adding one of the following to your mpiexec command ?>
> 1. --mca btl ^openib
> 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
> 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
> One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.
> --Carson

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170130/8192bed4/attachment-0002.p7s>

From qwzhang0601 at gmail.com  Tue Jan 31 10:36:13 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 12:36:13 -0500
Subject: [maker-devel] collecting protein sequences as evidences
Message-ID: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>

I wonder what's the best way to collect protein sequences for gene
annotation of a de novo genome assembly.
(1) My first choice is to get protein sequences of human and mouse from
UniProt. At this step, I am not clear whether I should download the
reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e.,
TrEMBL).
(2) On ther other hand, I also get protein sequences from NCBI, should I
just simply merge those fasta files. Does it matter if there are
redundancies? And also, if I get protein sequences from different sources,
they may not have the same quality. Do I need to do something before I
integrate protein sequences from different sources?

Many thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/315d4a00/attachment-0002.html>

From qwzhang0601 at gmail.com  Tue Jan 31 12:08:21 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 14:08:21 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from different
	tissues and individuals
Message-ID: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>

Hello:

I am trying to assemble transcripts using RNA-seq data by the tool Trinity,
which will be used for gene annotation for Maker. Now I have data from two
tissues with two replicates each. Should I merge all four samples to get
one assembly file? Or should I merge replicates of each tissue separately
and use the two assembly files as input of Maker. Merging all samples into
one, we will have much higher coverage level, but I think there may be some
genes expressed by tissue-specific isoforms. So I not sure whether I should
merge RNA-seq from different tissues.
What's more, I find some published RNA-seq data from another individual
(and also for different tissue from us) for the same species. Should I
merge all RNA-seq together (across individuals and tissues)? Or should I
generate different transcript assembly and use all those assemblies as
input to Maker?

Thanks
Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/66a95fb5/attachment-0002.html>

From michael.s.campbell1 at gmail.com  Tue Jan 31 12:26:29 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Tue, 31 Jan 2017 14:26:29 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from
	different tissues and individuals
In-Reply-To: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
Message-ID: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>

I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.

Example: 
est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta

Good luck,
Mike

> On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
> What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
> 
> Thanks
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From michael.s.campbell1 at gmail.com  Tue Jan 31 13:57:28 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Tue, 31 Jan 2017 15:57:28 -0500
Subject: [maker-devel] collecting protein sequences as evidences
In-Reply-To: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>
References: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>
Message-ID: <2E4D90C9-6D6E-4F52-A361-AFB06A61D2C2@gmail.com>

Hi Quanwei,

(1) When I use uniprot I use SWISS-prot and not tremble.
(2) I don?t merge files together. I just pass them all to MAKER as a comma separated list.

Thanks,
Mike

> On Jan 31, 2017, at 12:36 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. 
> (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). 
> (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? 
> 
> Many thanks
> 
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From cjfields at illinois.edu  Tue Jan 31 14:05:43 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 21:05:43 +0000
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
Message-ID: <A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>

I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.

chris

On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org on behalf of michael.s.campbell1 at gmail.com> wrote:

    I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.
    
    Example: 
    est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
    
    Good luck,
    Mike
    
    > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
    > 
    > Hello:
    > 
    > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
    > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
    > 
    > Thanks
    > Best
    > Quanwei
    > _______________________________________________
    > maker-devel mailing list
    > maker-devel at box290.bluehost.com
    > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
    
    
    _______________________________________________
    maker-devel mailing list
    maker-devel at box290.bluehost.com
    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
    

From mjfi2sb3 at gmail.com  Tue Jan 31 14:14:14 2017
From: mjfi2sb3 at gmail.com (Salim Bougouffa)
Date: Tue, 31 Jan 2017 21:14:14 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
	<357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
Message-ID: <CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>

Hi Christopher,

How would you identify a low confidence transcript? And how do you remove
them? Also, did you try setting a minimum read coverage in Trinity as the
default is one?

Best
/SB

On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, <cjfields at illinois.edu>
wrote:

> If I recall, from a BAM you would need to run a reference-based assembly
> on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use
> Trinity for ref-based assembly.  But I always choose the route of a full de
> novo assembly (again, Trinity or similar) when possible, doing some basic
> cleanup (e.g. remove low confidence transcripts) and bring them as EST
> evidence.
>
> chris
>
> From: maker-devel <maker-devel-bounces at yandell-lab.org> on behalf of
> Scott Cain <scott at scottcain.net>
> Date: Wednesday, January 25, 2017 at 2:23 PM
> To: Maya Britstein <mayabritstein at gmail.com>
> Cc: "maker-devel at yandell-lab.org List" <maker-devel at yandell-lab.org>, "
> help at gmod.org" <help at gmod.org>
> Subject: Re: [maker-devel] GFF3 file format
>
> Hi Maya,
>
> I'm not sure what MAKER's requirements are in this regard--I'm forwarding
> this to their mailing list.
>
> Scott
>
>
> On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com>
> wrote:
>
> Hi,
>
> I have RNA-seq data, and genomic data that I want to annotate using maker.
>
> From what I understood, I need to genarate a gff3 file format from the
> RNA-seq mapping sequences. I had mapped the RNA sequences to the genome
> using bowtie and tophat. However, I still do not know how to take these
> format and convert them to a gff3 file that I can them use in maker as
> annotation evidence
>
> I saw the wiki page, that did not mention how to make this conversion (
> http://gmod.org/wiki/GFF3
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>
> )
>
> Can you please help me?
>
> Sincerely,
> Maya
>
> ----
> Maya Britstein
> Ph.D candidate
> Laura Steindler's Lab
> Marine Biology Department
> Leon H. Charney School of Marine Sciences
> University of Haifa, Israel
>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> GMOD Coordinator (http://gmod.org/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)
>                    216-392-3087
> Ontario Institute for Cancer Research
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-- 

____________________________
Sent from Inbox Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/b06e01be/attachment-0002.html>

From qwzhang0601 at gmail.com  Tue Jan 31 14:33:12 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 16:33:12 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
Message-ID: <CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>

Thank you guys for your suggestions. So you do not suggest to use RNA-seq
data from another study, even I assemble them separately and then provide
both assemblies into Maker as a comma separated list. The issues you
mentioned do exist, but some people did collect RNA-seq data from different
individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198).
But thank you for your suggestions, I will think about it.

Best
Quanwei

2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu>:

> I agree with Mike.  I also suggest not combining RNA-Seqs from different
> runs (e.g. different studies) even if they are from the same tissue,
> development stage etc. There are many other factors (biological variation,
> sample quality, sequencing chemistry or technology differences, etc) that
> can significantly and negatively impact trx assembly quality.
>
> chris
>
> On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <
> maker-devel-bounces at yandell-lab.org on behalf of
> michael.s.campbell1 at gmail.com> wrote:
>
>     I would probably try merging the replicates but not the tissues. You
> can then pass the output files to MAKER in a comma separated list in the
> opts file.
>
>     Example:
>     est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
>
>     Good luck,
>     Mike
>
>     > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com>
> wrote:
>     >
>     > Hello:
>     >
>     > I am trying to assemble transcripts using RNA-seq data by the tool
> Trinity, which will be used for gene annotation for Maker. Now I have data
> from two tissues with two replicates each. Should I merge all four samples
> to get one assembly file? Or should I merge replicates of each tissue
> separately and use the two assembly files as input of Maker. Merging all
> samples into one, we will have much higher coverage level, but I think
> there may be some genes expressed by tissue-specific isoforms. So I not
> sure whether I should merge RNA-seq from different tissues.
>     > What's more, I find some published RNA-seq data from another
> individual (and also for different tissue from us) for the same species.
> Should I merge all RNA-seq together (across individuals and tissues)? Or
> should I generate different transcript assembly and use all those
> assemblies as input to Maker?
>     >
>     > Thanks
>     > Best
>     > Quanwei
>     > _______________________________________________
>     > maker-devel mailing list
>     > maker-devel at box290.bluehost.com
>     > http://box290.bluehost.com/mailman/listinfo/maker-devel_
> yandell-lab.org
>
>
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at box290.bluehost.com
>     http://box290.bluehost.com/mailman/listinfo/maker-devel_
> yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/e3c2dca5/attachment-0002.html>

From carsonhh at gmail.com  Tue Jan 31 14:35:20 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 31 Jan 2017 14:35:20 -0700
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
	<CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
Message-ID: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>

I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list.

?Carson
 
> On Jan 31, 2017, at 2:33 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it.
> 
> Best
> Quanwei 
> 
> 2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu <mailto:cjfields at illinois.edu>>:
> I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.
> 
> chris
> 
> On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org> on behalf of michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>> wrote:
> 
>     I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.
> 
>     Example:
>     est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
> 
>     Good luck,
>     Mike
> 
>     > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>     >
>     > Hello:
>     >
>     > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
>     > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
>     >
>     > Thanks
>     > Best
>     > Quanwei
>     > _______________________________________________
>     > maker-devel mailing list
>     > maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>     > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>     http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/770b0474/attachment-0002.html>

From cjfields at illinois.edu  Tue Jan 31 16:05:43 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 23:05:43 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
	<357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
	<CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>
Message-ID: <8BD384C9-4E46-42AC-A59F-96299EF5E104@illinois.edu>

You can use RSEM for some initial filtering:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts

Then I generally use the Trinity QA steps, in particular TransRate or DETONATE:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment

chris

From: Salim Bougouffa <mjfi2sb3 at gmail.com>
Date: Tuesday, January 31, 2017 at 3:14 PM
To: Chris Fields <cjfields at illinois.edu>, Scott Cain <scott at scottcain.net>, Maya Britstein <mayabritstein at gmail.com>
Cc: "maker-devel at yandell-lab.org List" <maker-devel at yandell-lab.org>, "help at gmod.org" <help at gmod.org>
Subject: Re: [maker-devel] GFF3 file format


Hi Christopher,

How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one?

Best
/SB

On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly.  But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence.

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Scott Cain <scott at scottcain.net<mailto:scott at scottcain.net>>
Date: Wednesday, January 25, 2017 at 2:23 PM
To: Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, "help at gmod.org<mailto:help at gmod.org>" <help at gmod.org<mailto:help at gmod.org>>
Subject: Re: [maker-devel] GFF3 file format

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:
Hi,

I have RNA-seq data, and genomic data that I want to annotate using maker.

From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence

I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>)

Can you please help me?

Sincerely,
Maya

----
Maya Britstein
Ph.D candidate
Laura Steindler's Lab
Marine Biology Department
Leon H. Charney School of Marine Sciences
University of Haifa, Israel


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)                     216-392-3087
Ontario Institute for Cancer Research
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=nDcMZi8LwiKXV-94ieW6tG0bEcaZof1aYjvJpMNjDME&s=kb8B_j9O5us3LoI3siiGDenax1ptk_GUX1LqjlB0S4U&e=>
--

____________________________
Sent from Inbox Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/25e09f2b/attachment-0002.html>

From cjfields at illinois.edu  Tue Jan 31 16:07:44 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 23:07:44 +0000
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
	<CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
	<656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>
Message-ID: <CAE4C80D-DD8F-4A2C-A33B-535456D233AE@illinois.edu>

Exactly

chris

From: Carson Holt <carsonhh at gmail.com>
Date: Tuesday, January 31, 2017 at 3:35 PM
To: Quanwei Zhang <qwzhang0601 at gmail.com>
Cc: Chris Fields <cjfields at illinois.edu>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals

I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list.

?Carson

On Jan 31, 2017, at 2:33 PM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:

Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it.
Best
Quanwei

2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>:
I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.

chris

On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org> on behalf of michael.s.campbell1 at gmail.com<mailto:michael.s.campbell1 at gmail.com>> wrote:

    I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.

    Example:
    est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta

    Good luck,
    Mike

    > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:
    >
    > Hello:
    >
    > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
    > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
    >
    > Thanks
    > Best
    > Quanwei
    > _______________________________________________
    > maker-devel mailing list
    > maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
    > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=TbJJJYslHBwbE9FJ8HEVNbD0E2OzS2-euK0lFIQMDgI&s=z4OxAc1Ttw7Hvqdr-PWOdGLMmQpnjXTOXfv-mUkoCJg&e=>


    _______________________________________________
    maker-devel mailing list
    maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=TbJJJYslHBwbE9FJ8HEVNbD0E2OzS2-euK0lFIQMDgI&s=z4OxAc1Ttw7Hvqdr-PWOdGLMmQpnjXTOXfv-mUkoCJg&e=>


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/c78c9df7/attachment-0002.html>

From rob.syme at gmail.com  Wed Jan  4 23:41:25 2017
From: rob.syme at gmail.com (Rob Syme)
Date: Thu, 05 Jan 2017 06:41:25 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
Message-ID: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>

Hi all

The MAKER wiki page "Repeat Library Construction - Advanced
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>"
describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded
MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there.
Are they distributed with MAKER or separately. Does anybody know where to
find them?

Thanks!

Rob Syme
Research Associate
Curtin University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/299fabc3/attachment-0003.html>

From olegl at volcani.agri.gov.il  Thu Jan  5 03:07:31 2017
From: olegl at volcani.agri.gov.il (Oleg Lovky)
Date: Thu, 5 Jan 2017 10:07:31 +0000
Subject: [maker-devel] Unable to train SNAP
Message-ID: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>

Hello,

I'm running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000).
However, I'm not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file.
Furthermore, when trying to use maker2zff I'm getting empty genome.ann and genome.dna files.

Please advise.

Regards,

Oleg Lovky, MSc.
Research Engineer
Institute of Plant Sciences
ARO, Volcani Center
Cell: 054-4870319
[v95_15]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/90c174a8/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 16191 bytes
Desc: image001.png
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/90c174a8/attachment-0003.png>

From michael.s.campbell1 at gmail.com  Thu Jan  5 07:54:17 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Thu, 5 Jan 2017 09:54:17 -0500
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <3B3F80CA-BFA1-4F0E-A2F1-CA60E8496D5F@gmail.com>

Hi Rob,

There is a link near the bottom of that wiki page at the end of this line

"CRL and other custom scripts are available here.?

That points to this URL http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz <http://www.hrt.msu.edu/uploads/535/78637/CRL_Scripts1.0.tar.gz>

Thanks,
Mike
> On Jan 5, 2017, at 1:41 AM, Rob Syme <rob.syme at gmail.com> wrote:
> 
> Hi all
> 
> The MAKER wiki page "Repeat Library Construction - Advanced <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them?
> 
> Thanks!
> 
> Rob Syme
> Research Associate
> Curtin University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170105/220eecad/attachment-0003.html>

From rob.syme at gmail.com  Thu Jan  5 18:29:35 2017
From: rob.syme at gmail.com (Rob Syme)
Date: Fri, 06 Jan 2017 01:29:35 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <CAEf4xgcZXf18ZWD9JusvrsyUdLCg_wOe2SuA2d91mnTcug+u1w@mail.gmail.com>

Oh dear. That's embarrassing for me! Sorry for the silly question.

-r

On Thu, 5 Jan 2017 at 14:41 Rob Syme <rob.syme at gmail.com> wrote:

> Hi all
>
> The MAKER wiki page "Repeat Library Construction - Advanced
> <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced>"
> describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded
> MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there.
> Are they distributed with MAKER or separately. Does anybody know where to
> find them?
>
> Thanks!
>
> Rob Syme
> Research Associate
> Curtin University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/ad9453c6/attachment-0003.html>

From xvazquezc at gmail.com  Thu Jan  5 19:23:17 2017
From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=)
Date: Fri, 6 Jan 2017 13:23:17 +1100
Subject: [maker-devel] Unable to train SNAP
In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
Message-ID: <CAL0hg4EEQk5CWkrni6-o29m_mAOkYjLKqjA8Df04FKJMbfDB8g@mail.gmail.com>

Are you using the -n option with maker2zff? You often get empty genome.ann
and genome.dna files if you don't.

On 5 January 2017 at 21:07, Oleg Lovky <olegl at volcani.agri.gov.il> wrote:

> Hello,
>
>
>
> I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1)
> containing ~50k reads (length ranges from 70 to 12000).
>
> However, I?m not getting transcript and proteins fasta files at all,
> despite Maker not giving any errors and everything is listed as finished in
> the datastore log file.
>
> Furthermore, when trying to use maker2zff I?m getting empty genome.ann and
> genome.dna files.
>
>
>
> Please advise.
>
>
>
> Regards,
>
>
>
> Oleg Lovky, MSc.
>
> Research Engineer
>
> Institute of Plant Sciences
>
> ARO, Volcani Center
>
> Cell: 054-4870319
>
> [image: v95_15]
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Xabier V?zquez-Campos, *PhD*
*Research Associate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/7dceb9af/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 16191 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/7dceb9af/attachment-0003.png>

From carsonhh at gmail.com  Fri Jan  6 12:28:02 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 6 Jan 2017 12:28:02 -0700
Subject: [maker-devel] Unable to train SNAP
In-Reply-To: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
References: <4BC28864194F044B9A7A4A07D7ED222A38BD44BC@MBX02.ARO.local>
Message-ID: <8F65E561-7450-4B5A-8F1B-4E51C0D25BE2@gmail.com>

The maker2zff script has a number of thresholds that must be reached to avoid filtering all models. If you don?t have protein evidence in the dataset for example, then that filter may always be failing. You may just want to turn all filters off with the -n option as previously suggested.

?Carson


> On Jan 5, 2017, at 3:07 AM, Oleg Lovky <olegl at volcani.agri.gov.il> wrote:
> 
> Hello,
>  
> I?m running Maker (2.31.8) with a genome and mRNA evidence (est2genome=1) containing ~50k reads (length ranges from 70 to 12000).
> However, I?m not getting transcript and proteins fasta files at all, despite Maker not giving any errors and everything is listed as finished in the datastore log file.
> Furthermore, when trying to use maker2zff I?m getting empty genome.ann and genome.dna files.
>  
> Please advise.
>  
> Regards,
>  
> Oleg Lovky, MSc.
> Research Engineer
> Institute of Plant Sciences
> ARO, Volcani Center
> Cell: 054-4870319
> <image001.png>
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/823f4e75/attachment-0003.html>

From kchilds at msu.edu  Thu Jan  5 07:28:00 2017
From: kchilds at msu.edu (Childs, Kevin)
Date: Thu, 5 Jan 2017 14:28:00 +0000
Subject: [maker-devel] Repeat library construction - CRL scripts
In-Reply-To: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
References: <CAEf4xgf5uFYZ4fGv_N2dVaD6MM4XpVE7P9=1UeTKUmwKM5NTVw@mail.gmail.com>
Message-ID: <6AE4044B-9011-4421-A6F1-FE3B95BBB11D@msu.edu>

Rob,

The scripts can be found in a link at the bottom of this wiki page:

http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced

Kevin Childs

---
Kevin Childs, PhD

Assistant Professor - Fixed Term
Center for Genomics-Enabled Plant Science
Plant Biology Department
Michigan State University

kchilds at msu.edu
517-775-2844 (m)
517-884-6926 (o)

http://childslab.plantbiology.msu.edu


> On Jan 5, 2017, at 1:41 AM, Rob Syme <rob.syme at gmail.com> wrote:
> 
> Hi all
> 
> The MAKER wiki page "Repeat Library Construction - Advanced" describes running scripts CRL_Step1.pl, CRL_Step2.pl, etc. I've downloaded MAKER versions 2.31.8 and 3.0.0, but these scripts don't seem to be there. Are they distributed with MAKER or separately. Does anybody know where to find them?
> 
> Thanks!
> 
> Rob Syme
> Research Associate
> Curtin University
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Fri Jan  6 18:22:10 2017
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Fri, 6 Jan 2017 20:22:10 -0500
Subject: [maker-devel] /tmp full
Message-ID: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>

Hi all,

Maker keeps filling up the /tmp directories on the cluster I am using. It
appears that most of the space is taken with many versions of various blast
databases. I suspect that this issue is partly due to my not using MPI and
instead launching multiple instances of maker (typically 16) in the same
working directory. However, it appears that maker is also leaving some of
these databases in /tmp even after it has died or been killed and they are
piling up.

I am submitting my jobs to the cluster via SLURM but have installed maker
locally rather than system-wide. My system administrator is going to try
creating a larger locally mounted directory on some of the nodes for me but
I wanted to check to see if you have any other suggestions to solve the
issue or make sure that maker cleans up /tmp as aggressively as possible.

I am using maker3-beta.

Thanks for any help,
Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170106/3fb552ff/attachment-0003.html>

From carsonhh at gmail.com  Sat Jan  7 16:29:29 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Sat, 7 Jan 2017 16:29:29 -0700
Subject: [maker-devel] /tmp full
In-Reply-To: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
References: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
Message-ID: <DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>

If you use the MPI settings, then all processes will share a single temporary directory, otherwise they each will have a separate one since they can?t intercommunicate.

MAKER tries to cleanup its files on finish or failure, but if you or the system kill it with certain signals, then it is reaped immediately by the system and not allowed to finish cleaning up. Signals 9 and 19 for example will do that. If a failure is related to the drive being full or a memory issue, then your system may be hitting it with one of these uncatchable signals. For example SLURM may use signal 9 or 19 if a process fails to respond to signal 15 in a timely manner (i.e. MAKER may be removing files, but SLURM gets impatient and kills it more aggressively because it thinks the process is not responding). You can always try and empty /tmp as the first step in your batch script, and it will remove files belonging to you before launching MAKER.

?Carson


> On Jan 6, 2017, at 6:22 PM, Benjamin Rubin <brubin at fieldmuseum.org> wrote:
> 
> Hi all,
> 
> Maker keeps filling up the /tmp directories on the cluster I am using. It appears that most of the space is taken with many versions of various blast databases. I suspect that this issue is partly due to my not using MPI and instead launching multiple instances of maker (typically 16) in the same working directory. However, it appears that maker is also leaving some of these databases in /tmp even after it has died or been killed and they are piling up. 
> 
> I am submitting my jobs to the cluster via SLURM but have installed maker locally rather than system-wide. My system administrator is going to try creating a larger locally mounted directory on some of the nodes for me but I wanted to check to see if you have any other suggestions to solve the issue or make sure that maker cleans up /tmp as aggressively as possible.
> 
> I am using maker3-beta.
> 
> Thanks for any help,
> Ben
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From brubin at fieldmuseum.org  Sun Jan  8 09:24:36 2017
From: brubin at fieldmuseum.org (Benjamin Rubin)
Date: Sun, 8 Jan 2017 11:24:36 -0500
Subject: [maker-devel] /tmp full
In-Reply-To: <DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>
References: <CAKpVPBLXwke7Fs656JorP-rj_jm0zm1aoLf9Z0iPGp4++K6W1w@mail.gmail.com>
	<DF892928-8AC1-4D13-AD9D-0B2C8F119153@gmail.com>
Message-ID: <CAKpVPBLfiYakZ3Ce2q02gYXatJHJzJ8dW-YMgscg9Nm6-KT03w@mail.gmail.com>

OK, thanks for the tips. Knowing the particulars of how SLURM might be
causing this is extremely helpful. I'll try to just empty /tmp before
running MAKER on each node, as you suggest. I suspect that will work but
will work on getting MPI running as well.

Thanks!
Ben

On Sat, Jan 7, 2017 at 6:29 PM, Carson Holt <carsonhh at gmail.com> wrote:

> If you use the MPI settings, then all processes will share a single
> temporary directory, otherwise they each will have a separate one since
> they can?t intercommunicate.
>
> MAKER tries to cleanup its files on finish or failure, but if you or the
> system kill it with certain signals, then it is reaped immediately by the
> system and not allowed to finish cleaning up. Signals 9 and 19 for example
> will do that. If a failure is related to the drive being full or a memory
> issue, then your system may be hitting it with one of these uncatchable
> signals. For example SLURM may use signal 9 or 19 if a process fails to
> respond to signal 15 in a timely manner (i.e. MAKER may be removing files,
> but SLURM gets impatient and kills it more aggressively because it thinks
> the process is not responding). You can always try and empty /tmp as the
> first step in your batch script, and it will remove files belonging to you
> before launching MAKER.
>
> ?Carson
>
>
>
>
> > On Jan 6, 2017, at 6:22 PM, Benjamin Rubin <brubin at fieldmuseum.org>
> wrote:
> >
> > Hi all,
> >
> > Maker keeps filling up the /tmp directories on the cluster I am using.
> It appears that most of the space is taken with many versions of various
> blast databases. I suspect that this issue is partly due to my not using
> MPI and instead launching multiple instances of maker (typically 16) in the
> same working directory. However, it appears that maker is also leaving some
> of these databases in /tmp even after it has died or been killed and they
> are piling up.
> >
> > I am submitting my jobs to the cluster via SLURM but have installed
> maker locally rather than system-wide. My system administrator is going to
> try creating a larger locally mounted directory on some of the nodes for me
> but I wanted to check to see if you have any other suggestions to solve the
> issue or make sure that maker cleans up /tmp as aggressively as possible.
> >
> > I am using maker3-beta.
> >
> > Thanks for any help,
> > Ben
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
_____________________________________________________
Benjamin ER Rubin, PhD
Committee on Evolutionary Biology
University of Chicago
benrubin.org

Division of Insects
Zoology Department
Field Museum of Natural History
1400 South Lake Shore Drive
Chicago, IL 60605
USA
Office: (312) 665-7776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170108/e4efa4cc/attachment-0003.html>

From lmainzer at life.illinois.edu  Mon Jan  9 00:02:01 2017
From: lmainzer at life.illinois.edu (Liudmila Sergeevna Mainzer)
Date: Mon, 9 Jan 2017 01:02:01 -0600
Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names
Message-ID: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>

Hello, MAKER developers!

I tried submitting this bug report through the web form on the 
RepeatMasker web page, but I am getting an "invalid submission" message, 
so I decided to post here.

I found a weird bug that results in the notorious "index out of bounds" 
error reported by RepeatMasker. Significantly, this error only arises on 
very long file names generated by MAKER.

I traced this through the code, and identified the error to originate in 
Tandem Repeat finder. TRF sometimes splits up its output into separate 
files. When that happens, the pieces with index >1 do not contain the 
sequence name. Compare the first few lines between these two files:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html
 
<HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 


     bgcolor="#File 1 of 2 FBF8BC"><PRE>
     Tandem Repeats Finder Program written by:
                   Gary Benson
                   Program in Bioinformatics
                   Boston University
     Version 4.09
     Sequence: InputSequencefrag-1 CHUNK number:191 <http://number:191>
     size:455659 <http://size:455659> offset:57300000
     <http://offset:57300000>
     Parameters: 2 3 5 75 20 33 7

etcetera
But also the second chunk:

  head -n 20 
output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
 
<HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 


     bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
     <A NAME="56278--56322,1,45.0,1,1136"></A><A
     HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
     <http://tandem.bu.edu/trf/trf.definitions.html#alignment> target
     ="explanation">Alignment explanation</A><BR><BR>
        Indices: 56278--56322  Score: 55
        Period size: 1  Copynumber: 45.0  Consensus size: 1

etcetera


See how one file has the full header with the "Sequence:" statement and 
the other one does not? This "Sequence:" statement is used in the 
RepeatMasker code to name each piece of sequence that ends up being 
masked later. When this variable if empty (the name string is not 
defined), the setSubstr subroutine in the main RepeatMasker code breaks: 
length of an undefined string is of course zero, and that subroutine has 
a check for sequences whose length is shorter than the region that needs 
to be masked.

So it quits with the statement "Error index out of bounds!", even though 
the sequence is finite length, does not have any weird characters, and 
is maskable.

Once again, this only arises on very long file names, and those seem to 
be created by MAKER. Example:
LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific

Notice how the last part of the file name has a bunch of identifiers 
separated by the %2E (generic URI-encoding)? I experimented with that 
file name. The path does not matter. The % signs do not matter. It is 
the length of the filename itself: if it is <108 characters, then 
RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like 
maybe Perl is not handling that long a name very well...

So the problem is three-fold: MAKER creates file names that are 
very-very long, while RepeatMasker breaks due to TRF failing to write 
the file headers properly for those very long file names.

Would you provide any suggestions or patches for this problem? It is 
forcing us to run RepeatMasker separately, outside the main MAKER 
worlflow, which really complicates the data management and analysis as a 
whole.
We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl 
v5.10.1 built for x86_64-linux-thread-multi.

Many thanks in advance,
Liudmila Mainzer

----------------
Senior Research Scientist
National Center for Supercomputing Applications

Research Assistant Professor
Institute of Genomic Biology

University of Illinois
217-300-0568
1205 W. Clark St. Room 4026
Urbana, IL 61801


From carsonhh at gmail.com  Mon Jan  9 09:30:09 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Mon, 9 Jan 2017 09:30:09 -0700
Subject: [maker-devel] MAKER/repeatmasker/TRF parsing of long file names
In-Reply-To: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>
References: <db00e539-d1da-6fc7-c66d-f18a238db418@life.illinois.edu>
Message-ID: <733D5263-6CFC-4AB3-BFDD-30330B0E1985@gmail.com>

The name used by maker is based off of the input file name, so quick fix would just be to rename your input file to have a shorter name.

?Carson


> On Jan 9, 2017, at 12:02 AM, Liudmila Sergeevna Mainzer <lmainzer at life.illinois.edu> wrote:
> 
> Hello, MAKER developers!
> 
> I tried submitting this bug report through the web form on the RepeatMasker web page, but I am getting an "invalid submission" message, so I decided to post here.
> 
> I found a weird bug that results in the notorious "index out of bounds" error reported by RepeatMasker. Significantly, this error only arises on very long file names generated by MAKER.
> 
> I traced this through the code, and identified the error to originate in Tandem Repeat finder. TRF sometimes splits up its output into separate files. When that happens, the pieces with index >1 do not contain the sequence name. Compare the first few lines between these two files:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.1.txt.html
> <HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 
> 
>    bgcolor="#File 1 of 2 FBF8BC"><PRE>
>    Tandem Repeats Finder Program written by:
>                  Gary Benson
>                  Program in Bioinformatics
>                  Boston University
>    Version 4.09
>    Sequence: InputSequencefrag-1 CHUNK number:191 <http://number:191>
>    size:455659 <http://size:455659> offset:57300000
>    <http://offset:57300000>
>    Parameters: 2 3 5 75 20 33 7
> 
> etcetera
> But also the second chunk:
> 
> head -n 20 output_folder/InputFileName_batch-1.masked.2.3.5.75.20.33.7.2.txt.html
> <HTML><HEAD><TITLE>InputFileName_batch-1.masked.2.3.5.75.20.33.7.txt.html</TITLE></HEAD><BODY 
> 
>    bgcolor="#File 2 of 2 Found at i:56286 original size:1 final size:1
>    <A NAME="56278--56322,1,45.0,1,1136"></A><A
>    HREF="http://tandem.bu.edu/trf/trf.definitions.html#alignment"
>    <http://tandem.bu.edu/trf/trf.definitions.html#alignment> target
>    ="explanation">Alignment explanation</A><BR><BR>
>       Indices: 56278--56322  Score: 55
>       Period size: 1  Copynumber: 45.0  Consensus size: 1
> 
> etcetera
> 
> 
> See how one file has the full header with the "Sequence:" statement and the other one does not? This "Sequence:" statement is used in the RepeatMasker code to name each piece of sequence that ends up being masked later. When this variable if empty (the name string is not defined), the setSubstr subroutine in the main RepeatMasker code breaks: length of an undefined string is of course zero, and that subroutine has a check for sequences whose length is shorter than the region that needs to be masked.
> 
> So it quits with the statement "Error index out of bounds!", even though the sequence is finite length, does not have any weird characters, and is maskable.
> 
> Once again, this only arises on very long file names, and those seem to be created by MAKER. Example:
> LocalTmp/JobName.maker.output/JobName_datastore/53/6E/10000001/theVoid.chr_number/57/chr_number.191.My_Species_Name_%2Erepeats%2Econsensi%2Efa%2Eclassified%2Ecleaned%2Empi%2E10%2E0.specific
> 
> Notice how the last part of the file name has a bunch of identifiers separated by the %2E (generic URI-encoding)? I experimented with that file name. The path does not matter. The % signs do not matter. It is the length of the filename itself: if it is <108 characters, then RepeatMasker/TRF runs fine. If it is 108 or more, it breaks. Seems like maybe Perl is not handling that long a name very well...
> 
> So the problem is three-fold: MAKER creates file names that are very-very long, while RepeatMasker breaks due to TRF failing to write the file headers properly for those very long file names.
> 
> Would you provide any suggestions or patches for this problem? It is forcing us to run RepeatMasker separately, outside the main MAKER worlflow, which really complicates the data management and analysis as a whole.
> We use RepeatMasker version open-4.0.6, maker-3.00.0-beta and perl v5.10.1 built for x86_64-linux-thread-multi.
> 
> Many thanks in advance,
> Liudmila Mainzer
> 
> ----------------
> Senior Research Scientist
> National Center for Supercomputing Applications
> 
> Research Assistant Professor
> Institute of Genomic Biology
> 
> University of Illinois
> 217-300-0568
> 1205 W. Clark St. Room 4026
> Urbana, IL 61801
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From qlian003 at ucr.edu  Wed Jan 11 22:28:32 2017
From: qlian003 at ucr.edu (Qihua Liang)
Date: Wed, 11 Jan 2017 21:28:32 -0800
Subject: [maker-devel] gff file: possible sources
Message-ID: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>

Hi Maker develop team!

I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as,

Possible Sources Include:
BLASTN - BLASTN alignment of EST evidence
BLASTX - BLASTX alignment of protein evidence
TBLASTX - TBLASTX alignment of EST evidence from closely related organisms
EST2Genome - Polished EST alignment from Exonerate
Protein2Genome - Polished protein alignment from Exonerate
SNAP - SNAP ab inito gene prediction
GENEMARK - GeneMarkab inito gene prediction
Augustus - Augustus ab inito gene prediction
FgenesH - FGENESH ab inito gene prediction
Repeatmasker - RepeatMasker identified repeat
RepeatRunner - RepeatRunner identified repeat from the repeat protein database
tRNAScan - tRNAScan-SE tRNA predictions (coming soon)
PASA - PASA gene predictions (coming soon)

There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above?

Thanks
Qihua

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170111/bc9a45df/attachment-0003.html>

From dence at genetics.utah.edu  Thu Jan 12 06:28:24 2017
From: dence at genetics.utah.edu (Daniel Ence)
Date: Thu, 12 Jan 2017 13:28:24 +0000
Subject: [maker-devel] gff file: possible sources
In-Reply-To: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>
References: <14573827-470F-4242-8E71-552C57B92EFD@ucr.edu>
Message-ID: <DE48F3CB-8B72-43A6-8331-ED1B811CDCCE@genetics.utah.edu>

Hi Qihua, the cdna2genome is the polished tblastx alignments from Exonerate. Basically, the source column should be the name of the tool that generated the alignment, prediction, or gene model.

~Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

On Jan 11, 2017, at 11:28 PM, Qihua Liang <qlian003 at ucr.edu<mailto:qlian003 at ucr.edu>> wrote:

Hi Maker develop team!

I am trying to figure the second column of gff file generated by maker, which should be the source of this annotation. Besides of what the tutorial lists as,

Possible Sources Include:

  *   BLASTN - BLASTN alignment of EST evidence
  *   BLASTX - BLASTX alignment of protein evidence
  *   TBLASTX - TBLASTX alignment of EST evidence from closely related organisms
  *   EST2Genome - Polished EST alignment from Exonerate
  *   Protein2Genome - Polished protein alignment from Exonerate
  *   SNAP - SNAP ab inito gene prediction
  *   GENEMARK - GeneMarkab inito gene prediction
  *   Augustus - Augustus ab inito gene prediction
  *   FgenesH - FGENESH ab inito gene prediction
  *   Repeatmasker - RepeatMasker identified repeat
  *   RepeatRunner - RepeatRunner identified repeat from the repeat protein database
  *   tRNAScan - tRNAScan-SE tRNA predictions (coming soon)
  *   PASA - PASA gene predictions (coming soon)

There are other sources that I noticed from my gff file, like cdna2genome. Is there any other detailed documentation explaining such sources besides of those listed above?

Thanks
Qihua

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170112/86ed58bb/attachment-0003.html>

From patel.kumar.vipul at gmail.com  Fri Jan 20 01:44:26 2017
From: patel.kumar.vipul at gmail.com (Vipul Patel)
Date: Fri, 20 Jan 2017 09:44:26 +0100
Subject: [maker-devel] Maker crash for long chrm.
Message-ID: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>

Hi,

I hope someone can help me to figure out what is actually going wrong.

I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP
variable not to use NFS. The given testcase as well for 1k<small contigs <
1MB runs without any problems.

Applying it to a sequence, for example with 57MB it failes, I tried it as
well with a different sequences around 60MB, same outcome.

I looked into the logs, but it was not really helpful as it was just stated
that the job failed

It crashed with following message:

deleted:0 genes
substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Calling translate without a seq argument!
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447
STACK: Bio::Tools::CodonTable::translate
/usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419
STACK: CGL::TranslationMachine::longest_translation_plus_stop
programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
STACK: maker::auto_annotator::get_translation_seq
programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236
STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/
snap.pm:974
STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/
snap.pm:690
STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
STACK: Process::MpiChunk::_go
programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469
STACK: Process::MpiChunk::run
programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341
STACK: programs/maker/maker/bin/maker:979
-----------------------------------------------------------
--> rank=16, hostname=dummy
ERROR: Failed while gathering ab-init output files
ERROR: Chunk failed at level:1, tier_type:2
FAILED CONTIG:chr_test

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:chr_test

examining contents of the fasta file and run log


--Next Contig--

Processing run.log file...

I got the same message if I run it without MPI, So I can guess it is not an
MPI issue.
How can I find out if some jobs died so maybe this could lead to this
problem?
Other ideas how I can tackle this problem?

Kind regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/d0c6f874/attachment-0003.html>

From patel.kumar.vipul at gmail.com  Fri Jan 20 06:34:28 2017
From: patel.kumar.vipul at gmail.com (Vipul Patel)
Date: Fri, 20 Jan 2017 14:34:28 +0100
Subject: [maker-devel] Maker crash for long chrm.
In-Reply-To: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
References: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
Message-ID: <CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>

Solved. After some digging and printing I found out the problem.

It was snap itself!

For anybody who maybe runs in the  same problem, check snap. Apparently it
was not correctly compiled and therefore it produced a not conform output!
Recompiling solved my issue.

Kind regards

2017-01-20 9:44 GMT+01:00 Vipul Patel <patel.kumar.vipul at gmail.com>:

> Hi,
>
> I hope someone can help me to figure out what is actually going wrong.
>
> I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP
> variable not to use NFS. The given testcase as well for 1k<small contigs <
> 1MB runs without any problems.
>
> Applying it to a sequence, for example with 57MB it failes, I tried it as
> well with a different sequences around 60MB, same outcome.
>
> I looked into the logs, but it was not really helpful as it was just
> stated that the job failed
>
> It crashed with following message:
>
> deleted:0 genes
> substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Calling translate without a seq argument!
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/
> Bio/Root/Root.pm:447
> STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/
> Bio/Tools/CodonTable.pm:419
> STACK: CGL::TranslationMachine::longest_translation_plus_stop
> programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
> STACK: maker::auto_annotator::get_translation_seq
> programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236
> STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../
> lib/Widget/snap.pm:974
> STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/
> snap.pm:690
> STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
> STACK: Process::MpiChunk::_go programs/maker/maker/bin/../
> lib/Process/MpiChunk.pm:1469
> STACK: Process::MpiChunk::run programs/maker/maker/bin/../
> lib/Process/MpiChunk.pm:341
> STACK: programs/maker/maker/bin/maker:979
> -----------------------------------------------------------
> --> rank=16, hostname=dummy
> ERROR: Failed while gathering ab-init output files
> ERROR: Chunk failed at level:1, tier_type:2
> FAILED CONTIG:chr_test
>
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:chr_test
>
> examining contents of the fasta file and run log
>
>
>
> --Next Contig--
>
> Processing run.log file...
>
> I got the same message if I run it without MPI, So I can guess it is not
> an MPI issue.
> How can I find out if some jobs died so maybe this could lead to this
> problem?
> Other ideas how I can tackle this problem?
>
> Kind regards
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/9e92c0fc/attachment-0003.html>

From carsonhh at gmail.com  Fri Jan 20 15:00:49 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 20 Jan 2017 15:00:49 -0700
Subject: [maker-devel] Maker crash for long chrm.
In-Reply-To: <CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>
References: <CAGmm4nfyOApO3DhbjFHs00_uNSTvYYUpyk-GQeVMvCUGn9E2Mg@mail.gmail.com>
	<CAGmm4nfkhVRcQ-SrWtsPGcuFG11w76cgQLq9kSfBDGO7Z_vwQQ@mail.gmail.com>
Message-ID: <59841676-741F-496D-9E47-7750417033A4@gmail.com>

I?m glad it?s working for you. Let us know if anything else comes up.

?Carson

> On Jan 20, 2017, at 6:34 AM, Vipul Patel <patel.kumar.vipul at gmail.com> wrote:
> 
> Solved. After some digging and printing I found out the problem.
> 
> It was snap itself!
> 
> For anybody who maybe runs in the  same problem, check snap. Apparently it was not correctly compiled and therefore it produced a not conform output! Recompiling solved my issue. 
> 
> Kind regards
> 
> 2017-01-20 9:44 GMT+01:00 Vipul Patel <patel.kumar.vipul at gmail.com <mailto:patel.kumar.vipul at gmail.com>>:
> Hi,
> 
> I hope someone can help me to figure out what is actually going wrong. 
> 
> I installed Maker 2.31.9, MPICH , BioPerl 1.7 via CPAN, pointed the TMP variable not to use NFS. The given testcase as well for 1k<small contigs < 1MB runs without any problems. 
> 
> Applying it to a sequence, for example with 57MB it failes, I tried it as well with a different sequences around 60MB, same outcome. 
> 
> I looked into the logs, but it was not really helpful as it was just stated that the job failed
> 
> It crashed with following message:
> 
> deleted:0 genes
> substr outside of string at /usr/share/perl/5.18/Carp.pm line 165.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Calling translate without a seq argument!
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.18.2/Bio/Root/Root.pm:447
> STACK: Bio::Tools::CodonTable::translate /usr/local/share/perl/5.18.2/Bio/Tools/CodonTable.pm:419
> STACK: CGL::TranslationMachine::longest_translation_plus_stop programs/maker/maker/bin/../lib/CGL/TranslationMachine.pm:280
> STACK: maker::auto_annotator::get_translation_seq programs/maker/maker/bin/../lib/maker/auto_annotator.pm:3236 <http://auto_annotator.pm:3236/>
> STACK: Widget::snap::load_phat_hits programs/maker/maker/bin/../lib/Widget/snap.pm:974 <http://snap.pm:974/>
> STACK: Widget::snap::parse programs/maker/maker/bin/../lib/Widget/snap.pm:690 <http://snap.pm:690/>
> STACK: GI::parse_abinit_file programs/maker/maker/bin/../lib/GI.pm:1194
> STACK: Process::MpiChunk::_go programs/maker/maker/bin/../lib/Process/MpiChunk.pm:1469
> STACK: Process::MpiChunk::run programs/maker/maker/bin/../lib/Process/MpiChunk.pm:341
> STACK: programs/maker/maker/bin/maker:979
> -----------------------------------------------------------
> --> rank=16, hostname=dummy
> ERROR: Failed while gathering ab-init output files
> ERROR: Chunk failed at level:1, tier_type:2
> FAILED CONTIG:chr_test
> 
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:chr_test
> 
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> 
> I got the same message if I run it without MPI, So I can guess it is not an MPI issue. 
> How can I find out if some jobs died so maybe this could lead to this problem?
> Other ideas how I can tackle this problem?
> 
> Kind regards
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170120/c26f37b6/attachment-0003.html>

From mayabritstein at gmail.com  Mon Jan 23 01:30:40 2017
From: mayabritstein at gmail.com (Maya Britstein)
Date: Mon, 23 Jan 2017 10:30:40 +0200
Subject: [maker-devel] Authorization failed.
Message-ID: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>

Hi,

I can't access the maker-devel archives. I am entering my email, and what I
think is my password, but still it doesn't work.

thanks,

Maya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170123/1d817b70/attachment-0003.html>

From bmoore at genetics.utah.edu  Mon Jan 23 05:43:53 2017
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Mon, 23 Jan 2017 12:43:53 +0000
Subject: [maker-devel] Authorization failed.
In-Reply-To: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>
References: <CAPho-ffzR0spZtaypn-dT1s2bPchsyUZRrcrtyrPwEXbfbQBWQ@mail.gmail.com>
Message-ID: <E0148C3A-ACD6-49B2-A39C-C8393D0E9CEA@genetics.utah.edu>

Hi Maya,

If you follow the link below you will find at the bottom of the page a portion of the form that allows you to reset your password.  It?s a little misleading because it looks like it?s only an ?Unsubscribe? option, but it also takes you to a page that allows you to update your subscription details including password reminder/reset.  The actual text for the portion of the page you?re looking for is this:

'To unsubscribe from maker-devel, get a password reminder, or change your subscription options enter your subscription email address:'

The linke is:

http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Regards,

Barry

On Jan 23, 2017, at 1:30 AM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:

Hi,

I can't access the maker-devel archives. I am entering my email, and what I think is my password, but still it doesn't work.

thanks,

Maya
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170123/c4c9f1fb/attachment-0003.html>

From daren.card at gmail.com  Tue Jan 24 07:06:22 2017
From: daren.card at gmail.com (Daren C. Card)
Date: Tue, 24 Jan 2017 08:06:22 -0600
Subject: [maker-devel] Maker error: Invalid nucleotide
Message-ID: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>

Hi everyone,

I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig:

?
Widget::augustus:
/opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus
#-------------------------------#

/opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
	Invalid nucleotide '8' encountered.


/opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
	Invalid nucleotide '8' encountered.

ERROR: Augustus failed
--> rank=7, hostname=moonunit0
ERROR: Failed while preparing ab-inits
ERROR: Chunk failed at level:0, tier_type:2
FAILED CONTIG:scaffold-92

ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:scaffold-92

examining contents of the fasta file and run log
?

Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues.

Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary.

Thanks,
Daren Card

UT-Arlington


From carsonhh at gmail.com  Wed Jan 25 10:37:50 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Wed, 25 Jan 2017 10:37:50 -0700
Subject: [maker-devel] Maker error: Invalid nucleotide
In-Reply-To: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>
References: <C1031ABF-E00A-4C65-85EC-C1BC4628DE9E@gmail.com>
Message-ID: <5E13AB7E-9175-4440-AD62-A53BD9DD8DE1@gmail.com>

Try running the contig in question (scaffold-92) as a separate MAKER run. That may haelp indicate if the issue may be a corrupt intermediate file (if it is, you can set clean_try=1 to force deletion of intermediate files before rerun).

?Carson


> On Jan 24, 2017, at 7:06 AM, Daren C. Card <daren.card at gmail.com> wrote:
> 
> Hi everyone,
> 
> I?m getting an error with an ongoing Maker run that I?m trying to troubleshoot. This is on a 2nd Maker run, where I used the first to prepare gene models for augustus/snap training, and have incorporated those results into this Maker run. The issue appears to be with augustus, and I?m getting the following type of error message for each contig:
> 
> ?
> Widget::augustus:
> /opt/maker/exe/augustus.2.5.5/bin/augustus --species=Boa_constrictor --UTR=off /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0 > /tmp/maker_xnOJ4d/scaffold-92.abinit_masked.0.Boa_constrictor.augustus
> #-------------------------------#
> 
> /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
> 	Invalid nucleotide '8' encountered.
> 
> 
> /opt/maker/exe/augustus.2.5.5/bin/augustus: ERROR
> 	Invalid nucleotide '8' encountered.
> 
> ERROR: Augustus failed
> --> rank=7, hostname=moonunit0
> ERROR: Failed while preparing ab-inits
> ERROR: Chunk failed at level:0, tier_type:2
> FAILED CONTIG:scaffold-92
> 
> ERROR: Chunk failed at level:4, tier_type:0
> FAILED CONTIG:scaffold-92
> 
> examining contents of the fasta file and run log
> ?
> 
> Augustus is apparently encountering ?8? nucleotides, which is weird. I?ve looked within the contig fasta file in /tmp/ and there are no ?8?s anywhere except the header lines. Everything else appears to be running without issues.
> 
> Any guidance on how I might further interpret and solve this issue would be greatly appreciated. Can provide more information if necessary.
> 
> Thanks,
> Daren Card
> 
> UT-Arlington
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From scott at scottcain.net  Wed Jan 25 13:23:02 2017
From: scott at scottcain.net (Scott Cain)
Date: Wed, 25 Jan 2017 15:23:02 -0500
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
Message-ID: <CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding
this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com>
wrote:

> Hi,
>
> I have RNA-seq data, and genomic data that I want to annotate using maker.
>
> From what I understood, I need to genarate a gff3 file format from the
> RNA-seq mapping sequences. I had mapped the RNA sequences to the genome
> using bowtie and tophat. However, I still do not know how to take these
> format and convert them to a gff3 file that I can them use in maker as
> annotation evidence
>
> I saw the wiki page, that did not mention how to make this conversion (
> http://gmod.org/wiki/GFF3)
>
> Can you please help me?
>
> Sincerely,
> Maya
>
> ----
> Maya Britstein
> Ph.D candidate
> Laura Steindler's Lab
> Marine Biology Department
> Leon H. Charney School of Marine Sciences
> University of Haifa, Israel
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170125/272d299a/attachment-0003.html>

From cjfields at illinois.edu  Wed Jan 25 15:03:51 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 25 Jan 2017 22:03:51 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
Message-ID: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>

If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly.  But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence.

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Scott Cain <scott at scottcain.net<mailto:scott at scottcain.net>>
Date: Wednesday, January 25, 2017 at 2:23 PM
To: Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, "help at gmod.org<mailto:help at gmod.org>" <help at gmod.org<mailto:help at gmod.org>>
Subject: Re: [maker-devel] GFF3 file format

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:
Hi,

I have RNA-seq data, and genomic data that I want to annotate using maker.

From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence

I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>)

Can you please help me?

Sincerely,
Maya

----
Maya Britstein
Ph.D candidate
Laura Steindler's Lab
Marine Biology Department
Leon H. Charney School of Marine Sciences
University of Haifa, Israel


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)                     216-392-3087
Ontario Institute for Cancer Research


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170125/57e6cafc/attachment-0003.html>

From qwzhang0601 at gmail.com  Thu Jan 26 13:26:42 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Thu, 26 Jan 2017 15:26:42 -0500
Subject: [maker-devel] canonical protein sequences or isoform?
Message-ID: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>

Hello:

I am doing annotation on a new genome and collecting proteins from mouse. I
found there are both canonical protein sequences and isoforms. I wonder
whether I should use only cannonical protein sequences or both the
canonical and isoforms?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170126/a8b37ec7/attachment-0003.html>

From rainer.rutka at uni-konstanz.de  Fri Jan 27 03:31:40 2017
From: rainer.rutka at uni-konstanz.de (Rainer Rutka)
Date: Fri, 27 Jan 2017 11:31:40 +0100
Subject: [maker-devel] Maker-Error when started with OpenMPI
Message-ID: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>

Hi everybody.

My name is Rainer. I am an administrator for our HPC-Systems at our
university in Konstanz, Baden-Wuertemberg/Germany.
The procect is called bwHPC-C5.

See: https://www.bwhpc-c5.de/en/index.php

I try to get Maker running on our bwUniCluster since weeks. Unfortunately
i get errors while running a Maker job in the MPI-environment.

BUILD STATUS

==============================================================================
STATUS MAKER v2.31.9
==============================================================================
PERL Dependencies: VERIFIED
External Programs: VERIFIED
External C Libraries: VERIFIED
MPI SUPPORT: ENABLED
MWAS Web Interface: DISABLED
MAKER PACKAGE: CONFIGURATION OK

MODULES / INCLUDES / COMPILERS

# knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
#
##### (B) Dependencies:
#
# conflict: any other maker version
# module load compiler/gnu/5.2
# module load mpi/openmpi/2.0-gnu-5.2
[...]

MPI/MOAB SUBMIT

[...]
### Queues ###
#MSUB -q fat
#MSUB -l nodes=1:ppn=16
#MSUB -l mem=20gb
#MSUB -l walltime=50:00:00
#
[...]
echo " "
echo "### Loading MAKER module:"
echo " "
module load bio/maker/2.31.9
[ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 
'bio/maker/2.31.9'."; exit 1; }
echo "MAKER_VERSION = $MAKER_VERSION"
module list
[...]
echo " "
echo "### Runing Maker example"
echo " "
export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
export OMPI_MCA_mpi_warn_on_fork=0

echo "LD_PRELOAD=${LD_PRELOAD}"
#
# "STATUS: Processing and indexing input FASTA files..."
#
mpiexec -mca btl ^openib -n 16 maker
[...]


E R R O R S
=======
[...]
LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
[uc1n338:113607] *** Process received signal ***
[uc1n338:113607] Signal: Segmentation fault (11)
[uc1n338:113607] Signal code: Address not mapped (1)
[uc1n338:113607] Failing at address: 0x4b0
[uc1n338:113608] *** Process received signal ***
[uc1n338:113608] Signal: Segmentation fault (11)
[uc1n338:113608] Signal code: Address not mapped (1)
[uc1n338:113608] Failing at address: 0x4b0
[uc1n338:113621] *** Process received signal ***
[uc1n338:113621] Signal: Segmentation fault (11)
[uc1n338:113621] Signal code: Address not mapped (1)
[uc1n338:113621] Failing at address: 0x4b0
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[...]

WHATS WRONG HERE!?

Thank you for your help!

All the best ,

Rainer

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/44fc3eb4/attachment-0003.p7s>

From michael.s.campbell1 at gmail.com  Fri Jan 27 08:36:11 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Fri, 27 Jan 2017 10:36:11 -0500
Subject: [maker-devel] canonical protein sequences or isoform?
In-Reply-To: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>
References: <CAOW6FSJJ4M8zz2unc-ChcDoa-+EMsHn_aVZoEZCxzChxQovm8w@mail.gmail.com>
Message-ID: <C9A931ED-273F-4B67-B9C2-32C86166312C@gmail.com>

I give MAKER all isoforms as evidence.

Mike
> On Jan 26, 2017, at 3:26 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am doing annotation on a new genome and collecting proteins from mouse. I found there are both canonical protein sequences and isoforms. I wonder whether I should use only cannonical protein sequences or both the canonical and isoforms?
> 
> Thanks
> 
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From qwzhang0601 at gmail.com  Fri Jan 27 09:13:22 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Fri, 27 Jan 2017 11:13:22 -0500
Subject: [maker-devel] transcript assembly of RNA-seq data
Message-ID: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>

Hello:

I wonder which is the best way to make use of RNA-seq data for gene
annotation of a new genome assembly.
(1) De novo assembly without mapping to any genome assembly (like Trinity)?
(2) TopHat+Cufflink do mapping to the new genome assembly, that want to
annotate?
(3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or
human)?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/b910c88d/attachment-0003.html>

From carsonhh at gmail.com  Fri Jan 27 09:23:40 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 27 Jan 2017 09:23:40 -0700
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
Message-ID: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>

(1) De novo assembly without mapping to any genome assembly (like Trinity)

You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.

?Carson


> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello: 
> 
> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. 
> (1) De novo assembly without mapping to any genome assembly (like Trinity)?
> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?
> 
> Thanks
> 
> Best
> Quanwei
>  
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/56300e39/attachment-0003.html>

From cjfields at illinois.edu  Fri Jan 27 15:21:15 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Fri, 27 Jan 2017 22:21:15 +0000
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
	<4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
Message-ID: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>

Yup I agree.  Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used?

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Date: Friday, January 27, 2017 at 10:23 AM
To: Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] transcript assembly of RNA-seq data

(1) De novo assembly without mapping to any genome assembly (like Trinity)

You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.

?Carson


On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:

Hello:

I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly.
(1) De novo assembly without mapping to any genome assembly (like Trinity)?
(2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
(3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?

Thanks

Best
Quanwei

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/ee2911fc/attachment-0003.html>

From carsonhh at gmail.com  Fri Jan 27 17:53:10 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 27 Jan 2017 17:53:10 -0700
Subject: [maker-devel] transcript assembly of RNA-seq data
In-Reply-To: <90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>
References: <CAOW6FSL4tVSkUx6xAcemzRmq9D2+YCV0NUiQve-qNrCOfiXz=w@mail.gmail.com>
	<4039F2B6-4DE8-479D-8EB8-A9B40C2C5218@gmail.com>
	<90A5F6C2-AB37-4098-8CF6-9906F4E7C173@illinois.edu>
Message-ID: <DA117F8A-20D0-4F99-96E5-CFF4FDAB1799@gmail.com>

No. My experience has just been with regular Trinity de novo assembly. Of course, I?d be interested in any one else?s attempt at this though.

?Carson


> On Jan 27, 2017, at 3:21 PM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> 
> Yup I agree.  Carson, would you know of any instances where HiSAT2/STAR+Stringtie or reference-based Trinity assemblies were (successfully) used?  
> 
> chris
> 
> From: maker-devel <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Date: Friday, January 27, 2017 at 10:23 AM
> To: Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>>
> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
> Subject: Re: [maker-devel] transcript assembly of RNA-seq data
> 
>> (1) De novo assembly without mapping to any genome assembly (like Trinity)
>> 
>> You get a lower false positive rate (TopHat+Cufflink is too noisy). And protein evidence will make up for any loss of sensitivity associated with the De novo assembly path. Make sure to us the jaccard_clip option  to reduce transcript merging in Trinity.
>> 
>> ?Carson
>> 
>> 
>>> On Jan 27, 2017, at 9:13 AM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>>> 
>>> Hello: 
>>> 
>>> I wonder which is the best way to make use of RNA-seq data for gene annotation of a new genome assembly. 
>>> (1) De novo assembly without mapping to any genome assembly (like Trinity)?
>>> (2) TopHat+Cufflink do mapping to the new genome assembly, that want to annotate?
>>> (3) TopHat+Cufflink do mapping to a close annotated genome (like mouse or human)?
>>> 
>>> Thanks
>>> 
>>> Best
>>> Quanwei
>>>  
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170127/32d7e3a3/attachment-0003.html>

From carsonhh at gmail.com  Sat Jan 28 13:53:45 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Sat, 28 Jan 2017 13:53:45 -0700
Subject: [maker-devel] Maker-Error when started with OpenMPI
In-Reply-To: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
References: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
Message-ID: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>

Try adding one of the following to your mpiexec command ?>

1. --mca btl ^openib
2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0

One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.

--Carson


> On Jan 27, 2017, at 3:31 AM, Rainer Rutka <rainer.rutka at uni-konstanz.de> wrote:
> 
> Hi everybody.
> 
> My name is Rainer. I am an administrator for our HPC-Systems at our
> university in Konstanz, Baden-Wuertemberg/Germany.
> The procect is called bwHPC-C5.
> 
> See: https://www.bwhpc-c5.de/en/index.php
> 
> I try to get Maker running on our bwUniCluster since weeks. Unfortunately
> i get errors while running a Maker job in the MPI-environment.
> 
> BUILD STATUS
> 
> ==============================================================================
> STATUS MAKER v2.31.9
> ==============================================================================
> PERL Dependencies: VERIFIED
> External Programs: VERIFIED
> External C Libraries: VERIFIED
> MPI SUPPORT: ENABLED
> MWAS Web Interface: DISABLED
> MAKER PACKAGE: CONFIGURATION OK
> 
> MODULES / INCLUDES / COMPILERS
> 
> # knbw03 20170117 r.rutka Initial revision knbw02 of module version 2.31.9
> #
> ##### (B) Dependencies:
> #
> # conflict: any other maker version
> # module load compiler/gnu/5.2
> # module load mpi/openmpi/2.0-gnu-5.2
> [...]
> 
> MPI/MOAB SUBMIT
> 
> [...]
> ### Queues ###
> #MSUB -q fat
> #MSUB -l nodes=1:ppn=16
> #MSUB -l mem=20gb
> #MSUB -l walltime=50:00:00
> #
> [...]
> echo " "
> echo "### Loading MAKER module:"
> echo " "
> module load bio/maker/2.31.9
> [ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
> echo "MAKER_VERSION = $MAKER_VERSION"
> module list
> [...]
> echo " "
> echo "### Runing Maker example"
> echo " "
> export LD_PRELOAD=${MPI_LIB_DIR}/libmpi.so
> export OMPI_MCA_mpi_warn_on_fork=0
> 
> echo "LD_PRELOAD=${LD_PRELOAD}"
> #
> # "STATUS: Processing and indexing input FASTA files..."
> #
> mpiexec -mca btl ^openib -n 16 maker
> [...]
> 
> 
> E R R O R S
> =======
> [...]
> LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-gnu-5.2/lib/libmpi.so
> STATUS: Parsing control files...
> STATUS: Processing and indexing input FASTA files...
> [uc1n338:113607] *** Process received signal ***
> [uc1n338:113607] Signal: Segmentation fault (11)
> [uc1n338:113607] Signal code: Address not mapped (1)
> [uc1n338:113607] Failing at address: 0x4b0
> [uc1n338:113608] *** Process received signal ***
> [uc1n338:113608] Signal: Segmentation fault (11)
> [uc1n338:113608] Signal code: Address not mapped (1)
> [uc1n338:113608] Failing at address: 0x4b0
> [uc1n338:113621] *** Process received signal ***
> [uc1n338:113621] Signal: Segmentation fault (11)
> [uc1n338:113621] Signal code: Address not mapped (1)
> [uc1n338:113621] Failing at address: 0x4b0
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 2 with PID 113608 on node uc1n338 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> [...]
> 
> WHATS WRONG HERE!?
> 
> Thank you for your help!
> 
> All the best ,
> 
> Rainer
> 
> -- 
> Rainer Rutka
> University of Konstanz
> Communication, Information, Media Centre (KIM)
> * High-Performance-Computing (HPC)
> * KIM-Support and -Base-Services
> Room: V511
> 78457 Konstanz, Germany
> +49 7531 88-5413
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From rainer.rutka at uni-konstanz.de  Mon Jan 30 01:32:08 2017
From: rainer.rutka at uni-konstanz.de (Rainer Rutka)
Date: Mon, 30 Jan 2017 09:32:08 +0100
Subject: [maker-devel] Maker-Error when started with OpenMPI
In-Reply-To: <73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>
References: <f30d7683-c103-d33c-6c58-a36677057c0a@uni-konstanz.de>
	<73509312-0658-4A58-90A8-6D3143EDB1C7@gmail.com>
Message-ID: <c89c60e5-1162-1297-5d71-99b1cbf315ec@uni-konstanz.de>

Hi Carson!

Thank you VERY MUCH for your hints.

Much appreciated!

I'll test these today and let you know about the results.

Again: THANKS! :-)

BTW: I'm not a scientist. Only a system operator.

:-)

Am 28.01.2017 um 21:53 schrieb Carson Holt:
> Try adding one of the following to your mpiexec command ?>
> 1. --mca btl ^openib
> 2. --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
> 3. --mca btl vader,tcp,self --mca btl_tcp_if_include eth0
> One or the other may fix your issue.  The first causes OpenMPI to not use the infiniband communication option (infiniband libraries use registered memory in a way that causes system calls to generate segfaults). It will usually force communication to go over another adapter. The second tries to use the infiband adapter, but uses TCP over infiniband (way to indirectly bypass problem causing libraries). The third specifically forces the use of the ethernet adapter instead of infiniband adapter.
> --Carson

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
* High-Performance-Computing (HPC)
* KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170130/8192bed4/attachment-0003.p7s>

From qwzhang0601 at gmail.com  Tue Jan 31 10:36:13 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 12:36:13 -0500
Subject: [maker-devel] collecting protein sequences as evidences
Message-ID: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>

I wonder what's the best way to collect protein sequences for gene
annotation of a de novo genome assembly.
(1) My first choice is to get protein sequences of human and mouse from
UniProt. At this step, I am not clear whether I should download the
reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e.,
TrEMBL).
(2) On ther other hand, I also get protein sequences from NCBI, should I
just simply merge those fasta files. Does it matter if there are
redundancies? And also, if I get protein sequences from different sources,
they may not have the same quality. Do I need to do something before I
integrate protein sequences from different sources?

Many thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/315d4a00/attachment-0003.html>

From qwzhang0601 at gmail.com  Tue Jan 31 12:08:21 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 14:08:21 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from different
	tissues and individuals
Message-ID: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>

Hello:

I am trying to assemble transcripts using RNA-seq data by the tool Trinity,
which will be used for gene annotation for Maker. Now I have data from two
tissues with two replicates each. Should I merge all four samples to get
one assembly file? Or should I merge replicates of each tissue separately
and use the two assembly files as input of Maker. Merging all samples into
one, we will have much higher coverage level, but I think there may be some
genes expressed by tissue-specific isoforms. So I not sure whether I should
merge RNA-seq from different tissues.
What's more, I find some published RNA-seq data from another individual
(and also for different tissue from us) for the same species. Should I
merge all RNA-seq together (across individuals and tissues)? Or should I
generate different transcript assembly and use all those assemblies as
input to Maker?

Thanks
Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/66a95fb5/attachment-0003.html>

From michael.s.campbell1 at gmail.com  Tue Jan 31 12:26:29 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Tue, 31 Jan 2017 14:26:29 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from
	different tissues and individuals
In-Reply-To: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
Message-ID: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>

I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.

Example: 
est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta

Good luck,
Mike

> On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hello:
> 
> I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
> What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
> 
> Thanks
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From michael.s.campbell1 at gmail.com  Tue Jan 31 13:57:28 2017
From: michael.s.campbell1 at gmail.com (Michael Campbell)
Date: Tue, 31 Jan 2017 15:57:28 -0500
Subject: [maker-devel] collecting protein sequences as evidences
In-Reply-To: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>
References: <CAOW6FSKhfeYz-BZdgQZsk1QGPOYzFanwCB-caLQsR+7Z2WBQcA@mail.gmail.com>
Message-ID: <2E4D90C9-6D6E-4F52-A361-AFB06A61D2C2@gmail.com>

Hi Quanwei,

(1) When I use uniprot I use SWISS-prot and not tremble.
(2) I don?t merge files together. I just pass them all to MAKER as a comma separated list.

Thanks,
Mike

> On Jan 31, 2017, at 12:36 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. 
> (1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). 
> (2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? 
> 
> Many thanks
> 
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


From cjfields at illinois.edu  Tue Jan 31 14:05:43 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 21:05:43 +0000
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
Message-ID: <A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>

I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.

chris

On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org on behalf of michael.s.campbell1 at gmail.com> wrote:

    I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.
    
    Example: 
    est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
    
    Good luck,
    Mike
    
    > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
    > 
    > Hello:
    > 
    > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
    > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
    > 
    > Thanks
    > Best
    > Quanwei
    > _______________________________________________
    > maker-devel mailing list
    > maker-devel at box290.bluehost.com
    > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
    
    
    _______________________________________________
    maker-devel mailing list
    maker-devel at box290.bluehost.com
    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
    

From mjfi2sb3 at gmail.com  Tue Jan 31 14:14:14 2017
From: mjfi2sb3 at gmail.com (Salim Bougouffa)
Date: Tue, 31 Jan 2017 21:14:14 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
	<357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
Message-ID: <CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>

Hi Christopher,

How would you identify a low confidence transcript? And how do you remove
them? Also, did you try setting a minimum read coverage in Trinity as the
default is one?

Best
/SB

On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, <cjfields at illinois.edu>
wrote:

> If I recall, from a BAM you would need to run a reference-based assembly
> on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use
> Trinity for ref-based assembly.  But I always choose the route of a full de
> novo assembly (again, Trinity or similar) when possible, doing some basic
> cleanup (e.g. remove low confidence transcripts) and bring them as EST
> evidence.
>
> chris
>
> From: maker-devel <maker-devel-bounces at yandell-lab.org> on behalf of
> Scott Cain <scott at scottcain.net>
> Date: Wednesday, January 25, 2017 at 2:23 PM
> To: Maya Britstein <mayabritstein at gmail.com>
> Cc: "maker-devel at yandell-lab.org List" <maker-devel at yandell-lab.org>, "
> help at gmod.org" <help at gmod.org>
> Subject: Re: [maker-devel] GFF3 file format
>
> Hi Maya,
>
> I'm not sure what MAKER's requirements are in this regard--I'm forwarding
> this to their mailing list.
>
> Scott
>
>
> On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com>
> wrote:
>
> Hi,
>
> I have RNA-seq data, and genomic data that I want to annotate using maker.
>
> From what I understood, I need to genarate a gff3 file format from the
> RNA-seq mapping sequences. I had mapped the RNA sequences to the genome
> using bowtie and tophat. However, I still do not know how to take these
> format and convert them to a gff3 file that I can them use in maker as
> annotation evidence
>
> I saw the wiki page, that did not mention how to make this conversion (
> http://gmod.org/wiki/GFF3
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>
> )
>
> Can you please help me?
>
> Sincerely,
> Maya
>
> ----
> Maya Britstein
> Ph.D candidate
> Laura Steindler's Lab
> Marine Biology Department
> Leon H. Charney School of Marine Sciences
> University of Haifa, Israel
>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> GMOD Coordinator (http://gmod.org/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)
>                    216-392-3087
> Ontario Institute for Cancer Research
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-- 

____________________________
Sent from Inbox Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/b06e01be/attachment-0003.html>

From qwzhang0601 at gmail.com  Tue Jan 31 14:33:12 2017
From: qwzhang0601 at gmail.com (Quanwei Zhang)
Date: Tue, 31 Jan 2017 16:33:12 -0500
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
Message-ID: <CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>

Thank you guys for your suggestions. So you do not suggest to use RNA-seq
data from another study, even I assemble them separately and then provide
both assemblies into Maker as a comma separated list. The issues you
mentioned do exist, but some people did collect RNA-seq data from different
individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198).
But thank you for your suggestions, I will think about it.

Best
Quanwei

2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu>:

> I agree with Mike.  I also suggest not combining RNA-Seqs from different
> runs (e.g. different studies) even if they are from the same tissue,
> development stage etc. There are many other factors (biological variation,
> sample quality, sequencing chemistry or technology differences, etc) that
> can significantly and negatively impact trx assembly quality.
>
> chris
>
> On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <
> maker-devel-bounces at yandell-lab.org on behalf of
> michael.s.campbell1 at gmail.com> wrote:
>
>     I would probably try merging the replicates but not the tissues. You
> can then pass the output files to MAKER in a comma separated list in the
> opts file.
>
>     Example:
>     est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
>
>     Good luck,
>     Mike
>
>     > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com>
> wrote:
>     >
>     > Hello:
>     >
>     > I am trying to assemble transcripts using RNA-seq data by the tool
> Trinity, which will be used for gene annotation for Maker. Now I have data
> from two tissues with two replicates each. Should I merge all four samples
> to get one assembly file? Or should I merge replicates of each tissue
> separately and use the two assembly files as input of Maker. Merging all
> samples into one, we will have much higher coverage level, but I think
> there may be some genes expressed by tissue-specific isoforms. So I not
> sure whether I should merge RNA-seq from different tissues.
>     > What's more, I find some published RNA-seq data from another
> individual (and also for different tissue from us) for the same species.
> Should I merge all RNA-seq together (across individuals and tissues)? Or
> should I generate different transcript assembly and use all those
> assemblies as input to Maker?
>     >
>     > Thanks
>     > Best
>     > Quanwei
>     > _______________________________________________
>     > maker-devel mailing list
>     > maker-devel at box290.bluehost.com
>     > http://box290.bluehost.com/mailman/listinfo/maker-devel_
> yandell-lab.org
>
>
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at box290.bluehost.com
>     http://box290.bluehost.com/mailman/listinfo/maker-devel_
> yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/e3c2dca5/attachment-0003.html>

From carsonhh at gmail.com  Tue Jan 31 14:35:20 2017
From: carsonhh at gmail.com (Carson Holt)
Date: Tue, 31 Jan 2017 14:35:20 -0700
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
	<CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
Message-ID: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>

I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list.

?Carson
 
> On Jan 31, 2017, at 2:33 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it.
> 
> Best
> Quanwei 
> 
> 2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu <mailto:cjfields at illinois.edu>>:
> I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.
> 
> chris
> 
> On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org> on behalf of michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>> wrote:
> 
>     I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.
> 
>     Example:
>     est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta
> 
>     Good luck,
>     Mike
> 
>     > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>     >
>     > Hello:
>     >
>     > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
>     > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
>     >
>     > Thanks
>     > Best
>     > Quanwei
>     > _______________________________________________
>     > maker-devel mailing list
>     > maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>     > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
>     _______________________________________________
>     maker-devel mailing list
>     maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>     http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/770b0474/attachment-0003.html>

From cjfields at illinois.edu  Tue Jan 31 16:05:43 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 23:05:43 +0000
Subject: [maker-devel] GFF3 file format
In-Reply-To: <CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>
References: <CAPho-ffGGQX0qT96Qa6BmBBn8kn89cinVy3wkj8RxDN7QnNZBg@mail.gmail.com>
	<CA+JTaoxR5XXoqFq16NaWUoDFE6tg0CfNFyU9ksORnLWvJP-2EQ@mail.gmail.com>
	<357E7CE8-2E9E-4F47-B3F7-9C54BB5267FC@illinois.edu>
	<CAJb_6LT8WSewfuQL0V83H-3m419EuoCbGF=C7B9PeKpaVgd74Q@mail.gmail.com>
Message-ID: <8BD384C9-4E46-42AC-A59F-96299EF5E104@illinois.edu>

You can use RSEM for some initial filtering:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts

Then I generally use the Trinity QA steps, in particular TransRate or DETONATE:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment

chris

From: Salim Bougouffa <mjfi2sb3 at gmail.com>
Date: Tuesday, January 31, 2017 at 3:14 PM
To: Chris Fields <cjfields at illinois.edu>, Scott Cain <scott at scottcain.net>, Maya Britstein <mayabritstein at gmail.com>
Cc: "maker-devel at yandell-lab.org List" <maker-devel at yandell-lab.org>, "help at gmod.org" <help at gmod.org>
Subject: Re: [maker-devel] GFF3 file format


Hi Christopher,

How would you identify a low confidence transcript? And how do you remove them? Also, did you try setting a minimum read coverage in Trinity as the default is one?

Best
/SB

On Thu, 26 Jan 2017, 01:04 Fields, Christopher J, <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
If I recall, from a BAM you would need to run a reference-based assembly on these data (e.g. Cufflinks2 or StringTie) to get this; you can also use Trinity for ref-based assembly.  But I always choose the route of a full de novo assembly (again, Trinity or similar) when possible, doing some basic cleanup (e.g. remove low confidence transcripts) and bring them as EST evidence.

chris

From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Scott Cain <scott at scottcain.net<mailto:scott at scottcain.net>>
Date: Wednesday, January 25, 2017 at 2:23 PM
To: Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org> List" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, "help at gmod.org<mailto:help at gmod.org>" <help at gmod.org<mailto:help at gmod.org>>
Subject: Re: [maker-devel] GFF3 file format

Hi Maya,

I'm not sure what MAKER's requirements are in this regard--I'm forwarding this to their mailing list.

Scott


On Wed, Jan 25, 2017 at 3:12 PM, Maya Britstein <mayabritstein at gmail.com<mailto:mayabritstein at gmail.com>> wrote:
Hi,

I have RNA-seq data, and genomic data that I want to annotate using maker.

From what I understood, I need to genarate a gff3 file format from the RNA-seq mapping sequences. I had mapped the RNA sequences to the genome using bowtie and tophat. However, I still do not know how to take these format and convert them to a gff3 file that I can them use in maker as annotation evidence

I saw the wiki page, that did not mention how to make this conversion (http://gmod.org/wiki/GFF3<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_wiki_GFF3&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=B1gZmgsg06xIvtRwdOwnNvYzLU-obgTch1fU0jWhy9w&e=>)

Can you please help me?

Sincerely,
Maya

----
Maya Britstein
Ph.D candidate
Laura Steindler's Lab
Marine Biology Department
Leon H. Charney School of Marine Sciences
University of Haifa, Israel


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__gmod.org_&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=OMPcIr8gCZc0MtV0NaCwpEmyn1zMeLOzW7v7LAyAcDU&s=wb5XMrtNYBOrAngXKEi9GdfppioFQ3nnLiJvcdP1jLo&e=>)                     216-392-3087
Ontario Institute for Cancer Research
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=nDcMZi8LwiKXV-94ieW6tG0bEcaZof1aYjvJpMNjDME&s=kb8B_j9O5us3LoI3siiGDenax1ptk_GUX1LqjlB0S4U&e=>
--

____________________________
Sent from Inbox Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/25e09f2b/attachment-0003.html>

From cjfields at illinois.edu  Tue Jan 31 16:07:44 2017
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 31 Jan 2017 23:07:44 +0000
Subject: [maker-devel] Transcript assembly of RNA-seq data from
 different tissues and individuals
In-Reply-To: <656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>
References: <CAOW6FS+G4CMBK99Mm9FHgVjwtN=CQ0LMk7XqNpAyqOYL7ZU2xg@mail.gmail.com>
	<873B8BD6-E2A7-4D5E-B1B1-1C313A7535AF@gmail.com>
	<A42F676B-42C4-4C68-A453-DDF0C4C4F35B@illinois.edu>
	<CAOW6FSKBWwhxGgQ9wahEujS_zcgPiAH569ypZG+C-iUQTGs7FQ@mail.gmail.com>
	<656C379A-906C-44AF-9503-4DD27203FC57@gmail.com>
Message-ID: <CAE4C80D-DD8F-4A2C-A33B-535456D233AE@illinois.edu>

Exactly

chris

From: Carson Holt <carsonhh at gmail.com>
Date: Tuesday, January 31, 2017 at 3:35 PM
To: Quanwei Zhang <qwzhang0601 at gmail.com>
Cc: Chris Fields <cjfields at illinois.edu>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Transcript assembly of RNA-seq data from different tissues and individuals

I think he means not to combine them for the transcript assembly preparation (i.e. assembly them separately). But you still provide them all to maker as a comma separated list.

?Carson

On Jan 31, 2017, at 2:33 PM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:

Thank you guys for your suggestions. So you do not suggest to use RNA-seq data from another study, even I assemble them separately and then provide both assemblies into Maker as a comma separated list. The issues you mentioned do exist, but some people did collect RNA-seq data from different individuals and used them for gene annotation (e.g., doi:10.1038/ng.3198). But thank you for your suggestions, I will think about it.
Best
Quanwei

2017-01-31 16:05 GMT-05:00 Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>:
I agree with Mike.  I also suggest not combining RNA-Seqs from different runs (e.g. different studies) even if they are from the same tissue, development stage etc. There are many other factors (biological variation, sample quality, sequencing chemistry or technology differences, etc) that can significantly and negatively impact trx assembly quality.

chris

On 1/31/17, 1:26 PM, "maker-devel on behalf of Michael Campbell" <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org> on behalf of michael.s.campbell1 at gmail.com<mailto:michael.s.campbell1 at gmail.com>> wrote:

    I would probably try merging the replicates but not the tissues. You can then pass the output files to MAKER in a comma separated list in the opts file.

    Example:
    est=/PATH/TO/file1.fsata,/PATH/TO/file2.fasta

    Good luck,
    Mike

    > On Jan 31, 2017, at 2:08 PM, Quanwei Zhang <qwzhang0601 at gmail.com<mailto:qwzhang0601 at gmail.com>> wrote:
    >
    > Hello:
    >
    > I am trying to assemble transcripts using RNA-seq data by the tool Trinity, which will be used for gene annotation for Maker. Now I have data from two tissues with two replicates each. Should I merge all four samples to get one assembly file? Or should I merge replicates of each tissue separately and use the two assembly files as input of Maker. Merging all samples into one, we will have much higher coverage level, but I think there may be some genes expressed by tissue-specific isoforms. So I not sure whether I should merge RNA-seq from different tissues.
    > What's more, I find some published RNA-seq data from another individual (and also for different tissue from us) for the same species. Should I merge all RNA-seq together (across individuals and tissues)? Or should I generate different transcript assembly and use all those assemblies as input to Maker?
    >
    > Thanks
    > Best
    > Quanwei
    > _______________________________________________
    > maker-devel mailing list
    > maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
    > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=TbJJJYslHBwbE9FJ8HEVNbD0E2OzS2-euK0lFIQMDgI&s=z4OxAc1Ttw7Hvqdr-PWOdGLMmQpnjXTOXfv-mUkoCJg&e=>


    _______________________________________________
    maker-devel mailing list
    maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
    http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__box290.bluehost.com_mailman_listinfo_maker-2Ddevel-5Fyandell-2Dlab.org&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=TbJJJYslHBwbE9FJ8HEVNbD0E2OzS2-euK0lFIQMDgI&s=z4OxAc1Ttw7Hvqdr-PWOdGLMmQpnjXTOXfv-mUkoCJg&e=>


_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/c78c9df7/attachment-0003.html>