|Home page||Services||Past achievements||Contact||Site
|Page d'accueil||Services||Réalisations précédentes||Contact|
|0:||JMS adrs||Jump (indirect) to Subroutine|
|1:||STR adrs||Store ACC at (Adrs)|
|2:||IND adrs||Index on (Adrs)|
|3:||BIC adrs||Bit-clear (Adrs) into ACC|
|4:||ROR adrs||Load right-rotation of (Adrs) into ACC|
|5:||COM adrs||Load 2s complement of (Adrs) into ACC|
|6:||ADD adrs||Add (Adrs) into ACC|
|7:||DCS adrs||Decrement ACC and jump to Adrs if not zero|
One might ask whether this is still relevant, today, in the days of CISCs with floating point instruction sets, and pipelined instruction streams. The answer is, "Yes." For example, there are many applications where the processing is not the primary function. For instance, a memory chip or communications chip might be given rudimentary processing ability to get on with simple housekeeping tasks, without wanting it to take too much of the silicon real estate away from the primary function.
The goal of the instruction set of the Cobweb machine was different (Shute, 1992). This machine was designed to be completely non-von-Neumann.
SKIM had shown that it was possible to conceive of a machine that executed Combinators as the primitive instruction set (Curry and Feys, 1958). But this machine was emulated in programming code, or at best in microcode, on a conventional, von Neumann-style computer. Similarly, the Manchester University Dataflow Machine, (as described, for example, in Duce (1984)) was emulated using a conventional von Neumann-style program.
But the beauty of the S, K and I combinators (along with B, C, W, B', C' and K") is that they each involve extremely simple hardware operations (even after allowing for needs of the garbage collection mechanism).
Cobweb, therefore, was given an instruction set, giving an assembler language that was itself a functional program language not unlike a simplified version of Lisp.
As well as the combinators, primitives were also included for operations on the integers (ADD, SUB, MULT, DIV, etc.). A program to evaluate a simple polynomial, for example, written in Sugar (Glaser, Hankin and Till, 1984) might be written:
poly = [x] x*x – 9; poly(5) ?and would compile to Cobweb code as:
(poly: C' – (t1: S' * I I) 9) (outp: poly 5)or, rather, as:
(poly: C' – t1 9) (t1: S' * I I) (outp: poly 5)
A Cobweb machine processor was, in fact, an extremely simple block of hardware. Indeed, the aim was for it to be so small that it would be feasible to build a vast array of them on a single piece of silicon, and for the network to function as a highly parallel multiprocessor computer.
The processors would be connected simply to their physical neighbours, passing data as discrete packets. (4 or 8-way connectivity could be used, though 6-way was generally assumed between hexagonal processors.) Thus, communicating those packets further afield would be achieved, in effect, by a packet-switched network.
Each packet would consist of up to six fields, with each field able to hold anything that can traditionally be represented in a word of memory (an integer, a floating-point number, an address or a primitive instruction). (Since combinator code binds from left to right, strings of any length are merely split up into several packets and linked together by addresses.)
The compiled code that was given earlier would be injected into the array as three packets:
(poly: C' – t1 9 ;) (t1: S' * I I ;) (poly 5 ; outp:)(Notice how the last one has been rotated left by one field, by the compiler, having recognised that outp: is, in fact, not an address, but a memory-mapped I/O primitive. In fact, the semicolon is merely an internal marker, and does not represent an extra field).
Each packet would be injected into a processor in the network, which would then pass it on to an appropriate neighbour. In order to achieve this, the addresses would need to be organised systematically. For reasons discussed later, cartesian coordinates are not feasible, but instead a spiral of working processors (a Catt spiral) would configure itself from the centre of the array, with radial connections across the layers of the spiral (hence the reference to cobwebs) as faster ways of travelling from one end of the spiral to the other. The processors were then simply allocated addresses according to their linear position on the spiral, and routing was achieved simply by comparing the magnitude of the address in the first field of the incoming packet to the magnitude of the address of the current processor.
Thus, in the above example, the first packet would be routed for storage in the processor whose address is poly, and the second to the processor whose address is t1. The third packet would also be routed to the processor whose address is poly. On arrival there, that processor would discover that it was already holding a packet in storage, and the two matching packets would be routed to the processor's ALU for processing.
In the case of matching packets, the ALU simply removes the first field of the application packet (the one starting with poly) and inserts those of the definition packet (the one starting with poly:). The definition packet would then be destroyed if its garbage counter reaches zero as a result of this match).
(C' – t1 9 5 ; outp:)would emerge from the ALU.
Packets emerging from the ALU are treated just like packets arriving from a neighbouring processor: a decision is made where to route them next. In this case, with a primitive operator, C', in the first field, the packet is routed back into the ALU for the operator to be executed.
In the case of executing a C' combinator, the fields of the original packet emerge rearranged on two packets:
(– t2 9 ; outp:) (t2: t1 5 ;)This behaviour is true of all of the combinators: none of them involve anything more complicated than taking an incoming packet, and redistributing its fields on one or two outgoing packets. The only two minor complications are those of obtaining a unique address, t2 in this case, for the name of the new definition packet, and organising the garbage collection information (which involved a partial-reference-counting mechanism.
The second of these packets would be routed off to the processor at address t2, while the first, with a primitive operator in its first field, would be routed to the ALU yet again.
Arithmetic operators can only be executed, though, if both of their arguments are numerical. The ALU behaves differently if one or both arguments have not yet been evaluated: it rotates the fields to the left again:
(t2 9 ; outp: –)and so the emerging packet is routed off to t2, too. When it arrives, it matches the previous packet, and results in generation of a single new packet (which has been rotated right again following the match):
(– t1 5 9 ; outp:)Again, this is routed through to the ALU (albeit in a different processor this time) because of the primitive operator in the first field, but there is still an unevaluated argument.
(t1 5 9 ; outp: –)This packet is, of course, routed off to the processor at address t1, where it is matched with the earlier packet. But instead of generation:
(– S' * I I 5 9 ; outp:)which would be too long to fit into a six field packet, the ALU leaves them split into two packets:
(t3 9 ; outp: –) (S' * I I 5 ; t3:)And so the execution continues.
In fact, there is much scope for optimisation that the compiler could have applied to the original object code. Wherever an I combinator is produced, there is probably an optimisation that could have been applied. The string (S' * I I) can be optimised to (W *), for example.
This gives a hint of the strengths and weaknesses of the machine. On the positive side, parallelism is extracted automatically by the hardware, with no need for the programmer to analyse where it is safe to put it in explicitly; plus, all of the operations are extremely simple, and easy to implement with very little hardware. On the negative side, the operations are indeed very simple, and program execution involves long chains of them (even on the optimised code), with much communication over a network of processors between instruction executions.
The usual approach for making a multiprocessor computer is to design the processor as an integrated circuit, and to replicate it across the area of a silicon wafer. Then, the individual processors are tested, and the faulty ones are marked with ink. Then the wafer is diced into its individual integrated circuit chips, and the unmarked (working) ones are put into packages. The packaged processor chips are then each soldered on to printed circuit boards, in our case to form a multiprocessor computer.
So, why go to all the trouble of separating the processors on the wafer, if the intention is to solder them back together again on the printed circuit board? Why not just package the complete silicon wafer as a multiprocessor computer?
This is one reason for the attraction of wafer scale integration (though there are also several others).
One of the problems is the random positions of the faulty processors on the wafer. The wafer starts by being laid out as a regular grid of processors, but this is not what is delivered to the user. The Catt spiral technique allows a linear chain of working processors to be "harvested" from the irregular array, but still does not generate a regular 2D grid. Thus conventional, imperative programming languages (many of which had already been enhanced to include matrix operations to be executed on large array-processors, like DAP-Fortran) appeared not to be the answer.
Functional programming was being researched as a means of managing the complexity of unpredictably shaped program execution graphs on to single processor computers. It seemed that it also offered the best way of mapping these on to unpredictably shaped layouts of multiple processors.
The aims of these two directions in research were eventually achieved, but not in the way that their researchers had originally intended. The lessons were instead taken on board by the designers of conventional computing systems.
As a result, the industry could gain the advantages of the new research without the disadvantage of having to change too far from the old ways of working.