Gerhard Niklasch on Thu, 24 May 2001 03:25:49 +0200 (MEST)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: PARI MPQS I/O


In response to:
> Message-ID: <Pine.NEB.4.33.0105231417570.22175-100000@mercury>
> From: Andy Brown <logic@warthog.com>
> To: <pari-users@list.cr.yp.to>
> 
> When UNIX PARI does MPQS, it uses several temporary files in /var/tmp.

Or wherever you point GPTMPDIR in the environment.

> Tracing the GP process shows it doing a lot of I/O to these files as MPQS
> gathers relations.
> 
> This is so much I/O that it seems to be a bottleneck.  My Ultra 10 333MHz
> is faster than a friend's Alpha 21264 500MHz doing MPQS on >50-digit
> composites!
> 
> Is there a way to eliminate these temp files, or at least make MPQS write
> them less often?

Nope, and you don't want to.  (Au contraire, twiddling with the parameters
to make MPQS write out even more partial relations is likely to speed
things up somewhat, at the potential expense of lots of disk space.)

The write(2) system calls happen in blocks anyway (4K or 8K depending
on the precise flavor of UNIX you happen to be on, and on the hardware).

On Solaris, when you point GPTMPDIR at a tmpfs directory, the metadata
(file length etc.) are never written to disk, and even on a normal UNIX
file system, the data themselves are likely to remain in RAM long enough
for the Gauss stage to find them there when we're doing a number of 20-40
digits.  These factorizations would never have to wait for something to
be read back from disk unless the system is already severely overloaded,
and they never have to wait for stuff to be written to disk because that
is done asynchronously by kernel and hardware long after the write syscall
has returned to the process.

When you're doing 70 digit or 85 digit numbers, things change of course,
data blocks will be written to disk and will expire from the buffer cache
long before they are read back for the Gauss stage -- but then the writes
will happen so infrequently compared to the sieving that we're >>99% cpu
and compute bound anyway.

If you turn on \g5 debuglevel, you'll see that much more time is spent
in the sieving phases than in the sorting phases  (where the files written
so far are read back and new ones written out: so more data are being
moved around at each such checkpoint than during the sieving phases in
between!).

You'd need to do some serious profiling to find out where the Ultra
runs away from the Alpha but I suspect it's the sieving and not the
I/O.  And the sieving is highly sensitive to cache sizes and such
things.

Cheers, Gerhard