Gerhard Niklasch on Thu, 30 Jul 1998 15:32:08 +0200 (MET DST) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: SEGV on bad recursion |
In response to: > Message-Id: <19980728150605.Z10326@io.txc.com> > Date: Tue, 28 Jul 1998 15:06:05 -0400 > From: Igor Schein <igor@txc.com> > > \\ f() is an recursively defined identity map on Z+ \ {1} > f(n)=if(n==2,2,f(n-1)+1) > f(1) > > That's where gp segfaults and dies. Here's what we found between us with a lot of testing, debugger- exercising, and near-crashing a few workstations. (1) Of course we don't expect this f(1) to return a useful result, ever. (2) The parser dutifully descends into the recursion, allocating a little parameter block on the PARI stack, copying the current value of n into that, pushing the little stack of values associated to the variable n, copying the string `if(n==2,2,f(n-1)+1)' to a chunk of malloc()ed memory, and calling itself recursively on that. (Not quite in this order, and I'm skipping a number of steps. There are actually about 8 nested calls of C functions in the parser for each recursion level.) (3) Thus the C stack, which is an animal entirely separate from the PARI stack, is filling up with call frames of C functions. (4) On SunOS/Solaris where Igor originally observed this behavior, the C stack sits in a fixed-size (8MBy) memory segment, and fills up this segment before the parameter blocks exhaust the default PARI stack, and before the other allocations exhaust the available virtual memory. (5) When this happens, a SIGSEGV is raised. Actually, we first get a segmentation exception from the memory management unit of the processor, which the Solaris kernel handles on its own little interrupt stack, and then the kernel tries to return control to gp's SIGSEGV handler in userland. (6) However, the signal handler is now faced with an already overflowing C stack, and cannot allocate its own stack frame, so the whole affair dumps core. (Actually, the kernel probably notices this before returning control to the gp process at user level, and initiates the core dump at once.) The behavior varies a bit depending on the operating system. Under Linux, there's no hard limit at 8MBy, and the process goes on for quite a bit longer, eating more and more virtual memory and slowing down the machine on which it is running to a crawl unless interrupted in time. (However, the Solaris phenomena happen both on UltraSparcs and on Solaris PCs, so the processor architecture plays a minor role here.) There doesn't appear to be a reliable and portable way to catch the problem just before we are beyond recovery. (Repeat: we _are_ catching SIGSEGVs, but when we're out of space on the C stack, the signal handler cannot run.) If someone does know of a reliable and portable way, we'd appreciate hearing about it. This leaves us with a bit of a dilemma. On the one hand, it is perfectly within the PARI philosophy to give users enough rope to hang themselves if they are so inclined. On the other, some users might appreciate some sort of safety valve. It would not be hard to add another gp default setting (changeable at runtime) which would limit the maximal nesting depth of gp user functions, say to a few thousand (4000 ought to be safe in all environments we have tried this on). Then if someone desperately needs a depth of 4002, they could change the default parameter, and get what they want, and if someone absolutely wants to shoot him-, her- or itself in the foot, they could set it to 0 meaning `unlimited', and crash their own machine or at least their gp process. Exceeding the nesting depth would _usually_ be a symptom of erroneous (or at least ill-conceived, if not, as in Igor's case, intentionally malicious ;^) input to gp, anyway. Comments? Opinions? Gerhard