[maker-devel] Does maker support muti-processing for a single long fasta sequence using openMPI?

Carson Holt carsonhh at gmail.com
Mon Oct 26 13:33:00 MDT 2020


Your MPI processes may not be seeing each other.  So you are getting multiple maker runs all colliding.  You need to reinstall MAKER and say ‘yes’ to the compile for MPI question.  You may also have to reinstall OpenMPI if just reinstalling MAKER does not work.  You can test MAKER for MPI by running the following —> mpiexec -mca btl ^openib -n 40 maker -help

If you get a single help message then everything is fine.  If you get 40 help messages, then MPI is not communicating correctly.

—Carson


> On Oct 20, 2020, at 10:23 AM, Xu, taosheng <taosheng.x at gmail.com> wrote:
> 
> Thank you very much Carson for your timely response,
> Yes I think so. The Maker MPI should support the multi-processing of an ultra-long single sequence. But I cannot run it successfully for a single sequence.
> First I  make sure the openMPI with maker has been installed properly. It works well for multiple DNA sequences in a parallel way.
> When I submit a maker job for an ultra-long single sequence (mpiexec -mca btl ^openib -n 40 maker -g scaffold1.fasta -fix_nucleotides, The max_dna_len is set to 100000). It always left only one maker thread run. The other maker threads are disappeared and show finished in the output information. See the output information below. Please help me to check it. Thanks for your kind help and your time.
> 
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often useless)
> 
> Best regards,
> Taosheng
> 
> 
> 
> OUTPUT Information
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> examining contents of the fasta file and run log
> A data structure will be created for you at:
> /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore
> 
> To access files for individual sequences use the datastore index:
> /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log
> 
> STATUS: Now running MAKER...
> 
> 
> 
> --Next Contig--
> 
> Processing run.log file...
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> #---------------------------------------------------------------------
> Another instance of maker is processing this contig!!
> SeqID: scaffold#1
> Length: 73580997
> #---------------------------------------------------------------------
> 
> 
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> #---------------------------------------------------------------------
> Another instance of maker is processing this contig!!
> SeqID: scaffold#1
> Length: 73580997
> #---------------------------------------------------------------------
> 
> 
> 
> 
> Maker is now finished!!!
> 
> 
> 
> Start_time: 1603000030
> End_time:   1603000033
> Elapsed:    3
> A data structure will be created for you at:
> /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore
> 
> To access files for individual sequences use the datastore index:
> /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log
> 
> STATUS: Now running MAKER...
> MAKER WARNING: The file plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/28/scaffold#1.289.plant_repeatFinal%2Elib%2Empi%2E10%2E2.specific.out
> did not finish on the last run and must be erased
> WARNING: Multiple MAKER processes have been started in the
> same directory.
> 
> STATUS: Processing and indexing input FASTA files...
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> #---------------------------------------------------------------------
> Another instance of maker is processing this contig!!
> SeqID: scaffold#1
> Length: 73580997
> #---------------------------------------------------------------------
> 
> 
> WARNING: Multiple MAKER processes have been started in the
> same directory.
> 
> STATUS: Processing and indexing input FASTA files...
> STATUS: Setting up database for any GFF3 input...
> STATUS: Setting up database for any GFF3 input...
> A data structure will be created for you at:
> /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore
> 
> To access files for individual sequences use the datastore index:
> /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log
> 
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> #---------------------------------------------------------------------
> Another instance of maker is processing this contig!!
> SeqID: scaffold#1
> Length: 73580997
> #---------------------------------------------------------------------
> 
> 
> 
> 
> Maker is now finished!!!
> 
> 
> 
> Start_time: 1603000030
> End_time:   1603000034
> Elapsed:    4
> #---------------------------------------------------------------------
> Now starting the contig!!
> SeqID: scaffold#1
> Length: 73580997
> #---------------------------------------------------------------------
> 
> 
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> #---------------------------------------------------------------------
> Another instance of maker is processing this contig!!
> SeqID: scaffold#1
> Length: 73580997
> #---------------------------------------------------------------------
> 
> 
> examining contents of the fasta file and run log
> 
> 
> 
> --Next Contig--
> 
> #---------------------------------------------------------------------
> Another instance of maker is processing this contig!!
> SeqID: scaffold#1
> Length: 73580997
> #---------------------------------------------------------------------
> 
> 
> 
> 
> Start_time: 1603000030
> End_time:   1603000034
> Elapsed:    4
> 
> 
> Maker is now finished!!!
> 
> A data structure will be created for you at:
> /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore
> 
> To access files for individual sequences use the datastore index:
> /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_master_datastore_index.log
> 
> STATUS: Now running MAKER...
> examining contents of the fasta file and run log
> ..........
> 
> Maker is now finished!!!
> 
> 
> 
> Start_time: 1602920407
> End_time:   1602920580
> Elapsed:    173
> running  repeat masker.
> #--------- command -------------#
> Widget::RepeatMasker:
> cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E0.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.0
> #-------------------------------#
> running  repeat masker.
> #--------- command -------------#
> Widget::RepeatMasker:
> cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E1.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.1
> #-------------------------------#
> running  repeat masker.
> #--------- command -------------#
> Widget::RepeatMasker:
> cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E2.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.2
> #-------------------------------#
> running  repeat masker.
> #--------- command -------------#
> Widget::RepeatMasker:
> cd /tmp/maker_VflRYH; ./maker3.01/exe/RepeatMasker/RepeatMasker /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0/scaffold#1.0.plant_repeatFinal%2Elib%2Empi%2E10%2E3.specific -dir /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/plant_datastore/B5/CD/scaffold#1//theVoid.scaffold#1/0 -pa 1 -lib /data/splitgenome/top1000/splitGenome/part1/plant.maker.output/mpi_blastdb/plant_repeatFinal%2Elib.mpi.10/plant_repeatFinal%2Elib.mpi.10.3
> #-------------------------------#
> running  repeat masker.
> #--------- command -------------#
> 
> .....
> 
> On Mon, Oct 19, 2020 at 10:32 PM Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
> Yes. It will divide contigs into chunks the same size as the max_dna_len parameter.
> 
> —Carson
> 
> Sent from my iPhone
> 
> > On Oct 17, 2020, at 12:48 AM, Xu, taosheng <taosheng.x at gmail.com <mailto:taosheng.x at gmail.com>> wrote:
> > 
> > 
> > Dear Maker Development Team,
> > I wonder whether maker supports parallel processing for a single long genome sequence?
> > 
> > When I submit my maker task using openMPI with multiple cpus (like, mpiexec -n 40  maker) to annotate a single long genome sequence, always only one maker with 4 rmblast run. The other cpus is on idle.
> > 
> > I want to use the maker parallel processing with openMPI to speed up a single ultra-long genome sequence annotation.
> > 
> > Best regards,
> > Taosheng
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>
> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org <http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20201026/484c9535/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20201026/484c9535/attachment-0003.p7s>


More information about the maker-devel mailing list