<HTML>
<HEAD>
<TITLE>Re: Maker: some suggestions and comments</TITLE>
</HEAD>
<BODY>
<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Hello thanks for all the excellent comments and suggestions. I’ll try and and answer all the questions you had. <BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><FONT COLOR="#0000FF">>Judging from the results WU-BLAST is much slower than NCBI BLAST. This is why I’m using NCBI’s BLAST suite.<BR>
</FONT><BR>
This may be partially due to the fact we are using more filtering options in WUBLAST (The same options just don’t exist in NCBI BLAST). These filtering options are the primary reason we prefer WUBLAST to NCBI BLAST. We set NCBI BLAST up to mimic as closely as possible the WUBLAST filters we have set up, but you still get a handful of odd alignments. If you want to get a really in depth look into the differences between WUBLAST and NCBI BLAST, I recommend the O’Reilly BLAST book (coincidentally Mark Yandell the PI on the MAKER project is one of the authors).<BR>
<BR>
<FONT COLOR="#0000FF">>I don’t understand why MAKER is using the –z –Y and –K flags and not the defaults.<BR>
</FONT><BR>
We use –z and –Y to normalize the search space for the statistics that calculate the e-value. By default BLAST calculates these values on the fly, they can be different for each search. Because we have to divide large genomic sequence up into chunks (otherwise BLAST will die), the values calculated by BLAST become incorrect, and start to lose their meaning. We instead set the values to an average genome length and an estimate for the average length of a protein. By hard coding the values, the e-values become normalized to each other and can then compare directly across sequence chunks and contigs.<BR>
<BR>
We set –K to 100 because HSPs beyond this amount for a single hit end up being spurious and get filtered out anyways with downstream internal filters. So this speeds up the BLAST searches and internal MAKER processing.<BR>
<BR>
<BR>
<FONT COLOR="#0000FF">>When running the masking steps I found some errors concerning the e value given in the control files. I changed the code in the GI.pm subroutine...<BR>
</FONT><BR>
Thanks for catching that bug. I’ve now made the necessary changes to the MAKER code base. <BR>
<BR>
<BR>
<FONT COLOR="#0000FF">>Runlevel 2 failed in this cases because blastall couldn’t calculate the parameters.<BR>
</FONT><BR>
I’ve seen the problem before in WUBLAST, and I know I fixed MAKER for WUBLAST; but now that you mention it, I haven’t done so for NCBI BLAST. I’ll take care of that now. Thanks. <BR>
<BR>
<BR>
<FONT COLOR="#0000FF">>After specifiying my tmp_dir in the control files it hasn’t worked the way I wanted it. Therefore I altered once again GI.pm: $TMP = tempdir("maker_XXXXXX", CLEANUP => 1, TMPDIR => 1, DIR => ‘path’);<BR>
</FONT><BR>
I’m not exactly sure what you mean here by “worked the way I wanted it”, to know just what change is suggested and why.<BR>
<BR>
<BR>
<FONT COLOR="#0000FF">>To get the STDOUT/STDERR output of MAKER into a file I added the lines ... directly to /bin/maker. This helps me in debugging.<BR>
</FONT><BR>
Another way to do this conveniently on the command line is to use ‘tee’. i.e:<BR>
<BR>
maker 2>&1 | tee file_name<BR>
<BR>
This will cause the output to write to the screen and to the filename given to ‘tee’ simultaneously. I don’t think maker ever actually writes to STDOUT. Instead it sends all status and error messages via STDERR, so redirecting STDERR to STDOUT before sending it to ‘tee’ will capture everything (i.e. The 2>&1 in the example above).<BR>
<BR>
<BR>
<FONT COLOR="#0000FF">>I also tried to alter the code to perform BLAST queries using the NCBI BLAST server. This would by a great alternative to standalone BLAST. But because of lack of time I haven’t continued to follow this direction. I also had to cheat a little bit to get the nr database to work.<BR>
</FONT> <BR>
Interesting. Good suggestion. I’m sure that looking through the MAKER code has not been easy. Much of the code base is optimized to run via MPI on a cluster (no shared memory), so much of the code is somewhat non-linear (i.e. Lib/Process/MpiChunk.pm). If you can wind your way through, my hats off to you.<BR>
<BR>
Also I know NR caused difficulties on previous versions of MAKER because of it’s size (weird indexing issues in the BioPerl module Bio::FastaDB). But the current version of MAKER will break the database up into pieces first to get around these issues, so make sure you have the most recent version of MAKER (version 09-09-2009).<BR>
<BR>
Just for your own interest, I would suggest that you try installing MPICH2 and running mpi_maker if you have a cluster available to you (or even just a multiprocessor system). You will get even better performance as this will parallelize other steps beyond just BLAST (which is controlled by the cpus option in the control files).<BR>
<BR>
<BR>
<FONT COLOR="#0000FF">>Since I’m an autodidact and I have no contact to any person involved in bioinformatics it would be great to stay in contact with you. My points are not meant to annoy you. It is my first time to participate in an open project. <BR>
</FONT> <BR>
All suggestions are very welcome.<BR>
<BR>
Thanks,<BR>
Carson<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
</SPAN></FONT></BLOCKQUOTE>
</BODY>
</HTML>