Casper H.S. Dik said:
"leaf subroutine" is also used on SPARC (leaf subroutines don't bother
to use a register window and they return with "retl", return from leaf
subroutine)
the original cp/67 kernel that i got in jan '68 had all intra-kernel
linkages via 360 supervisor call. the kernel convention was that the
svc call interrupt routine would dynamically allocate a savearea for
the called routine ... also do a little bookkeeping and trace entry
for debugging. the svc return would unallocate the saveaea and return.
one of the pathlength things i did was go thru all of the kernel and
identify all leaf routines. i identified these and modified the kernel
call macro to recognize a leaf routine was being called and do a BALR
in place of an svc8. i then defined a fixed (unused/reserved) location
in page zero for temporary register save ... this then eventually came
to be called "balrsave" (so the leaf routine saved caller's registers
in page zero temporary area rather than in a passed save area).
the next thing was to go thru and identify all non-leaf routines that
only made calls to leaf routines ... these then became sort of 2nd
order leaf routines. these were also modified so that the caller used
BALR in place of svc8. however, these routines instead of using
"balrsave" for temporary save of the caller's registers used an
adjacent area that became to be called "freesave".
For various reasons, the svc8/svc12 calling convention originally took
approx 275microseconds on 360/67 (per call) ... it was possible to
optimize that down to about 100microseconds by recoding some of the
stuff used for debugging purposes. Several of the leaf routines were
high frequency calls and performed operations on the order of hundred
microseconds or less ... and therefor the svc8/svc12 calling
convention was on the order of half that processing time.
The svc call to BALR change picked up something like 20-30 percent of
(remaining) kernel time ... on a kernel that I had already optimized
to pickup something like 80percent with fastpath changes described
in previous posts
http://www.garlic.com/~lynn/2004f.html#6 Infiniband - practicalities for small clusters
the earlier 80percent kernel overhead optimization (presented at fall
'68 boston share) had included various interrupt and dispatching
fastpath as well as special case fastpath for various virtual machine
simulation operations. It also included the reduction in the svc8/12
call/return overhead from 275mics to around 100mics ... but didn't
include the BALR call changes for leaf routines.
The BALR call changes were done the following summer (of 69), when I
got con'ed into going to Boeing (student summer employee with a
fulltime management job classification level and a badge that let me
park in the management parking lot at corporate hdqtrs next to boeing
field) to help get BCS setup and operational. That summer, I also did
the first version of dynamic adaptive fair share scheduler, the global
LRU page replacement, and the hack that allowed portions of the cp
kernel to be non-resident and pageable.
scheduler refs:
http://www.garlic.com/~lynn/subtopic.html#fairshare
page replacement refs:
http://www.garlic.com/~lynn/subtopic.html#wsclock