[maker-devel] Issue due to intensive I/O
Carson Holt
carsonhh at gmail.com
Tue Jan 20 09:45:01 MST 2015
Genome annotation is very data intensive as opposed to CPU intensive. In MAKER, most IO intensive operations will occur in a temporary directory pointed to by the TMP= option in the MAKER control files. If you are setting this value to a location on a network mounted drive then this could be the source of your problem. Also TMP= defaults to the location of the TMPDIR Linux environmental variable, so make sure that TMPDIR is not set to a network mounted location either. The temporary directory needs to be a locally mounted location. There will still need to be a number of global files though; however, we’ve previously ran MAKER on over 8,000 cpus on Lustre file systems with no issues.
It is possible that it is the metadata server that is having problems as opposed to the object storage server if the genome being annotated has a large number of small contigs. Lots of small contigs in a fragmented genome assembly result in a lot of small result files, but very little reading and writing. Such a situation can be quite stressful for Lustre file systems because they don’t like having large numbers of very small files (it overwhelms the metadata server even though the object storage server will be under more moderate load). Make sure you are setting min_contig= to something like 10000 if that is the case to avoid generating analysis for short un-annotatable contigs (they may number in the hundreds of thousands on lower quality genome assemblies and contain no useful information). You can also set clean_up=1 in the maker control files, to delete files as MAKER advances. This removes restart capability because you won’t have logged results from previous runs, but it will reduce the burden on the Metadata server (which is affected by total file number as opposed to file read/write operations). Also setting clean_up=1 can help you avoid any administrator defined limits on total file number per user (administrators commonly set this limit on Lustre based file systems to avoid taxing the metadata server).
So your issue is likely caused by one of two things:
1. Improperly setting TMP= in the maker_opts.ctl file or the Linux TMPDIR environmental variable to a network mounted location. Fixed by setting these to a locally mounted location (usually /tmp).
2. Too many total files being generated by a fragmented genome assembly. Fixed by either setting min_contig=10000 in order to skip short contigs or by setting clean_up=1 to avoid logging too many files. This happen because it is very difficult to overwhelm Lustre's object storage servers (which perform IO read/write operations), but it’s relatively easy to overwhelming the metadata server (affected by total file count rather than total IO throughput).
—Carson
> On Jan 19, 2015, at 5:55 AM, Stephen Wang <wangyichao at sjtu.edu.cn> wrote:
>
> Dear MAKER Team,
>
> I am a cluster administrator in the university. The issue is caused by MAKER jobs, which access massive small files and crashed Lustre file system.
>
> Hardware: 16 cores per node
> Software: OpenMPI 1.6.5 and GCC 4.9.1
>
> Q1: Does MAKER have to generate a large number of files on the global file system?
> Q2: Can any parameters help MAKER avoid I/O intensive access? Any experience on Lustre?
>
> MAKER is a quite important software for our user. Hope for your help.
>
> BR,
> Stephen
>
> --
> Stephen Wang, GPU Computing Specialist
> Center for High Performance Computing
> Shanghai Jiao Tong University
> Room 205 Network Center, 800 Dongchuan Road, Shanghai 200240 China
> Mobi:+86-136-6151-1618 Web:http://hpc.sjtu.edu.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150120/b5a0a00f/attachment-0002.html>
More information about the maker-devel
mailing list