[maker-devel] PARALLELIZED DE NOVO GENOME ANNOTATION WITHOUT MPI

Quanwei Zhang qwzhang0601 at gmail.com
Wed Mar 1 14:09:30 MST 2017


Thank you. I have submit my jobs to our server. What I plan to do is like
this: (1) split contigs into 50 files; (2) for each contig file, I
collected the annotation into gff and protein sequences into fasta format;
(3) manually merge the 50 gff files and protein sequences files. Is what I
am doing also correct?

Best
Quanwei

2017-03-01 15:54 GMT-05:00 Carson Holt <carsonhh at gmail.com>:

> If you split into separate files, you can use the -g option to select the
> input file together with the -base option so all output goes to the same
> directory. Because they technically have different input files, this will
> avoid file locking issues. You have to use the -dsindex option at the end
> to rebuild the datastore index, so it looks like a single job. But that is
> one way to get around the issue.
>
> —Carson
>
>
>
> On Mar 1, 2017, at 1:52 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>
> Thank you. But I met some problems  with MPI on our server. So now I split
> my contigs into several files and annotate those files separately. After I
> finish the annotation on each file, I will merge the results.
>
> Thank you for your explanation!
>
> Best
> Quanwei
>
> 2017-03-01 15:36 GMT-05:00 Carson Holt <carsonhh at gmail.com>:
>
>> If you submit too many simultaneous, MAKER run then file locks will start
>> to collide and one run will slow down the others. You should submit fewer
>> simultaneous jobs and instead use MPI (maker must be configured and
>> compiled to use MPI).
>>
>> An example MPI launch command for running on 200 CPUs on a cluster —>
>> mpiexec -n 200 maker 2> maker_mpi1.error
>>
>> —Carson
>>
>>
>>
>> > On Feb 27, 2017, at 8:25 AM, Quanwei Zhang <qwzhang0601 at gmail.com>
>> wrote:
>> >
>> > Hello:
>> >
>> > I am doing genome annotation using Maker on our high performance
>> computational cluster (HPC). Due to some issues of MPI, I submitted the
>> Maker jobs several times under the same directory to HPC. Followed by the
>> example in the protocol (as shown below), when I submit the jobs I make
>> them as background processes by "&" except the first one. Is this necessary
>> when I submit a job to a HPC? I found it costed much much longer time than
>> I expected (according to a testing on a smaller data set). I am not sure
>> whether setting the process as background process lead to this issue?
>> >
>> > The example in the protocol
>> > % maker 2> maker1.error
>> > % maker 2> maker2.error &
>> > % maker 2> maker3.error &
>> > ......
>> >
>> > BTW, will the annotation on shorter contig (e.g., 500bp) cost ~ 1/100
>> of the time that cost for annotation a 50000bp contig? I am using SNAP for
>> an inito and RNA-seq assembly and protein sequences as evidence. I have
>> more than half contigs shorter than 300bp (whose total length is only about
>> 5% of the total length of all contigs), I want to know whether I can save
>> about half (or only about 5%) of the time if I ignore those short contigs.
>> >
>> >  Thanks
>> >
>> > Best
>> > Quanwei
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at box290.bluehost.com
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170301/58086b0c/attachment-0003.html>


More information about the maker-devel mailing list