Diminished-C (DimC) was originally designed
for a high-level programming languages class
on a microelectronics M.Sc. at Middlesex Polytechnic
(now Middlesex University).
The idea was that the coursework exercises should be set with the underlying theme
of gradually working towards the building of a simple, high-level language compiler).
The two main components of the compiler were a symbol table,
based on a simple array,
and an operator-precedence parser,
based on a linked-list binary tree.
For the first exercise, though, the students merely wrote a program to take in ASCII text,
and to output the same text with the comments removed.
Then, in the next exercise, individual symbols were recognised, and placed in the table.
Gradually, over successive weeks, more functionality was added to the symbol table,
and the parse-tree was eventually added.
Finally, the output from the compiler
became the object code for a microprocessor board
that the students were designing in a hardware class.
The source language of the compiler is Diminished-C,
which is largely a cut-down version of the C programming language,
with a few Pascal-like constructs added,
where these were more convenient to implement
in the short time that the students had available.
DimC allows for function definition (implicitly of type integer);
and simple integer assignment operations, =,
with right-side expressions containing integer constants, user variables, user functions,
the primitive getc() function to fetch a single ASCII character,
and the diadic arithmetic operators, + - * / %.
(Monadic negation, in DimC, can only be achieved by diadic subtraction from zero.)
The diadic bitwise operators could also be added to the compiler,
if the students had time.
Diminished-C also has a conditional "if" operator
that takes a single parameter containing expressions with diadic relative operators,
==, <, <=, >, >=, != (or a Pascal-like <>).
Also, Pascal-like, the next symbol had to be the "then" keyword.
In Diminished-C, there is no "else" branch.
Similarly, there are no compound or nested statement structures, and no "while" loops.
The effects of each of these are achieved, instead, using recursive function calls.
/* Test program, written in DimC, to read in a hexadecimal integer
and to print out the decimal equivalent in reverse order */
int num; /* global variable(s) */
get16digit() /* integer function */
int ans, ch; /* local variables */
ch = getc();
ans = 0-999; /* -999 is the end flag */
if( ('0'<=ch)*('9'>= ch) ) then ans = ch-'0';
if( ('A'<=ch)*('Z'>= ch) ) then
ans = ch-'A'+10;
if( ('a'<=ch)*('z'>= ch) ) then ans =
ans /* value to be returned */
} /* no ";" separator on last line */
get16num(x) /* integer argument(s) */
int digit, ans;
ans = x; digit = get16digit();
if( digit >= 0 ) then ans = x*16 + digit;
if( digit >= 0 ) then ans = get16num(ans);
} /* recursive function */
putc( (val%10) + '0' );
if( (val/10) > 0 ) then putr10num( val/10 )
} /* recursive procedure */
/* the test program body */
num = get16digit(0);
The operator-precedence parser works by assigning a precedence level to every symbol
that it meets in the source file.
For instance, + and - might have a precedence level 3,
with * / % on a higher precedence level of 4,
and ^ an even higher precedence level of 5.
The relational operators would have a lower precedence level, 2,
the assignment operator a precedence level of 1,
and the statement and argument separators ";" and ","
the lowest precedence level, 0.
All integers and user-defined symbols (variables and functions)
would have the highest precedence level, so far, 7.
Lastly, each opening parenthesis would cause all the subsequent precedence levels
to be incremented by 10,
and each closing parenthesis would cause subsequent ones
to be decremented back down again.
the symbols (and their precedence levels) in the expression "y=a+b*(42+max(c,d))" are:
y(7) =(1) a(7) +(3) b(7) *(4) 42(17) +(13) max(17) c(27) ,(20) d(27).
The parse tree is grown as soon as the parser encounters an opening brace,
and stops when it encounters the matching closing brace.
At each new symbol read in,
a new node is inserted in the tree, containing a record of the symbol and its precedence level.
If its precedence level is less-than-or-equal to that of the previously inserted node,
the new node takes the place of the old one,
and the old one accessed via the left-link of the new node;
else, the new node is inserted as a right-link of the previously inserted node.
(It is worth noting that it is the "-or-equal" clause
that enforces the left-to-right precedence in expressions without parentheses.)
Upon reading the closing brace for the function definition,
the parse tree is read out.
This process can be considered to be like an adventurer exploring a simple labyrinth.
He enters the parse tree at the top, armed only with a portable tape-recorder,
and proceeds to explore the labyrinth while holding on to the right hand wall.
Whenever he exits a node by the left or right passages,
he says "OPEN" into the microphone,
and whenever he enters the node again by the left or right passages,
he says "CLOSE".
Since he enters each node three times (once each from the top, left and right passages)
he has three opportunities for naming the node into the microphone.
The students experiment with all three rules,
and show that the input text is correspondingly converted
into prefix, infix or postfix notation.
At the end of all this,
the students set the compiler to produce postfix notation.
Because the program execution was assumed to be stack-based,
it is possible to dispense with the "OPEN" and "CLOSE" information
(representing open and close parenthesis).
The + - * and /
are encoded as ADD, SUB, MULT and DIV instructions,
possibly with sequences of a few other instructions to manipulate the data stack
(such as "POP a;POP b;ADD a,b;PUSH b".
Lastly, "," is a NOP, and ";" pops the next-but-one item off the stack.
A rudimentary linking-loader was also written, if time allowed,
to resolve references to library functions,
such as to the "getc" and "putc" functions
for the input/output of a single ASCII character.
Otherwise, these, too, could simply be converted to a fixed sequence of instructions.