VAR logo English
Home page Services Past achievements Contact Site
Page d'accueil Services Réalisations précédentes Contact

Towards an Algebra for Genealogy

Towards a Notation for Family Relationships

Natural languages give names for important blood relatives: mother, father, uncle, grandmother, daughter. It is convenient to find a way of codifying these in a notation that is concise and logical. The notation that is used here, consists of two integers, representing steps up and down the family tree, respectively, separated by a binary value (x or y) to represent female or male.

Thus 1y0 is the male person one step up the tree, and no steps down, namely my father. A similar code can be given for each of the other commonly occurring blood relations:

2x2Cousine (to use the French)

From this notation, it becomes apparent that siblings are zeroth-cousins, uncles and aunts are zeroth-cousins-once-removed-senior, and parents, grandparents and great-grandparents are similarly zeroth-, first- or second- cousins once-, twice- or thrice-removed (senior).

Here are a few examples going in the other direction on the tree:

3y4Second-cousin once removed junior

This notation already has the advantage of being concise, and much shorter than the equivalent expression in English (for writing on the family tree, for instance).

Incidentally, the letters x and y were chosen to correspond to the names of the female and male chromosomes. The letters f and m had been considered for female and male, but these led to a great deal of confusion, since the most commonly expressed relationships are mother and father, which have the same initials, but reversed.

There is one other important relationship that needs to be added to the above lists: self (0y0) (or 0x0 for a female speaker). Using this, the next table shows how the notation can be extended to represent various spouse relationships. The notation for aunt-by-marriage, for example, is coded as "my uncle's spouse's self".

1x1s0y0Brother-in-law (sister's spouse)
0y0s1y1Brother-in-law (spouse's brother)
1y1s0x0Sister-in-law (brother's spouse)
0y0s1x1Sister-in-law (spouse's sister)
1y0s1y0 or 1x0s1y0Grand-stepfather

When written on the tree, the "self" notation can be omitted, and left implied, as in (2y1s) for my aunt-by-marriage (my uncle's spouse), and (s1x0) for my mother-in-law (my spouse's mother). However, it has been left explicit for the descriptions on this page.

Whenever cousins (or other relations) marry together in the family tree, their descendants end up, from my perspective, with two alternative notations (which might or might not be the same). The notation gives us a way of determining which of the alternative relationships is the stronger.

There is obviously scope to cascade the spouse notation ad infinitum, for example to represent my uncle's spouse's great-aunt's spouse's brother (2y1s3x1s1y1). It is possible, also, to add other pseudo-relationships, such as friend, just for the sake of concise notation in a description on a family tree. For this, the letter "a" stands for "ami", so as to avoid the letter "f" still. Thus, a reference to my grandmother's friend (as a bridesmaid on a wedding photograph, for example) might be indicated as (2x0a0x0).

Narrowing down the Year Ranges

As an aside from the main story, but still in the context of "functionality that a family-tree computer program can incorporate", we can note that one important way of tracing ancestors is to look for birth and marriage certificates. This involves looking in the registers of the appropriate year and place. If the year is not known precisely, but only to a range of ±5 years, then all eleven registers in that range need to be consulted, one by one.

The aim of the "year-ties" notation is to get the computer to narrow down the range of years as tightly as possible, even where the exact year is unknown, to make the process of searching for birth and marriage certificates easier.

Every event (year of birth, YoB; year of marriage, YoM; year of death, YoD) is first represented as a pair of integers, expressing a range: (YoB.min, YoB.max), (YoM.min, YoM.max), (YoD.min, YoD.max). If the date is known exactly, then the min and max fields are set to the same value. Similarly, if the range of possible dates is known, then the min and max fields are set accordingly. Otherwise the fields are set to the default range of 1066 (say) in the min field, and the current year in the max field. (In the case of the YoD, the max field can be left blank (zero) to indicate "not yet dead").

The "tiemax(A.YoE, B.YoF, val)" procedure asserts that the time between two events cannot be greater than "val". The "tiemin(A.YoE, B.YoF, val)" procedure asserts that the time between two events cannot be less than "val". At their hearts, these procedures behave as follows:

tiemax(A.YoE, B.YoF, val) BEGIN
  IF( (B.YoF.min-A.YoE.min) > val )
    THEN A.YoE.min := B.YoF.min-val;
  IF( (B.YoF.max-A.YoE.max) > val )
    THEN B.YoF.max := A.YoE.max+val
tiemin(A.YoE, B.YoF, val) BEGIN
  IF( (B.YoF.min-A.YoE.min) < val )
    THEN B.YoF.min := A.YoE.min+val;
  IF( (B.YoF.max-A.YoE.max) < val )
    THEN A.YoE.max := B.YoF.max-val

In reality, these procedures cannot be quite as simple as this, since various checks and safeguards need to be included; for example, to check that the min value is not allowed to pass the max value, and vice versa.

Having defined these procedures, the whole database can be scanned repeatedly for a number of constraints:

tiemax( Z.YoD, P.YoB, 111 )
Since no person, Z, in my family is known to have made it into Guinness Book of Records (GBR), I can assume that no-one has lived beyond the age of 111 (say).
tiemin( Z.YoD, P.YoB, 0 )
No person, Z, has died before his birth-date. (Luckily, family trees do not worry about technical exceptions to this rule).
tiemin( mth_child(Z).YoB, Z.YoB, 13+m )
Again, because of the GBR principle, I can assume that no person, Z, in my family is known to have given birth to a first child before the age of 14, and hence of an mth child before the age of 13+m. This further assumes that consecutive children are not born in the same year. This is not strictly true, but is a good enough for it not to need to be overridden too often. The computer program does, though, need to be sophisticated enough to cope with twins, and parts of the tree where the order of births in a family is presently unknown.
tiemax( mth_child(X).YoB, X.YoB, 55-numbofchildren(X)+m )
Again, because of the GBR principle, I can say that no female person, X, in my family is known to have given birth to a last child beyond the age of 55.
tiemax( child(Z).YoB, Z.YoD, 0 )
No person, Z, has given birth to a child beyond his or her own year of death. (To be safer, the value 0 applies to mothers, X, and the value 1 to fathers, Y).
tiemin( elder_sibling(Z).YoB, Z.YoB, 1 )
There must be at least one year between consecutive siblings. In fact, we can be a lot more adventurous than this. If we know the dates of birth of the mth and nth children, and Z is known to be the zth child in between them, then we can assert tiemin(M.YoB, Z.YoB, z-m) and tiemin(Z.YoB, N.YoB, n-z).

There are a number of other ties, including assumptions about how to narrow down a year of marriage. Most constraints occur in pairs, so as to express the min and max values for the constraint. Events, such as getting married, or being present at someone else's wedding, or appearing on a photograph (at a certain age, or range of ages), can all be used to constrain the dates of birth and death of the person.

This is no magic panacea, of course. If one date of birth is known exactly in one portion of the tree, that person's mother's date of birth can be inferred within a range of 41 years, and the maternal grandmother's within a range of 82 years, and so on, with the range widening linearly at each generation. (Because of the symmetry, this growth rule works in both directions on the tree, so that the dates of births of the aunts are also then known within a range of about 123 years). The ranges grow even faster along the male lines, due to the lack of the male menopause. Luckily, information ripples in from other sources around the tree, and combines in a way that tends to narrow the ranges down from this. The amount of information that the computer program can infer, therefore, though limited, can still be of some help in narrowing down the search in birth, death and marriage registers.

Towards an Algebra for Family Relationships

It is then tempting to contemplate allowing other compound relationships to be expressed, such as "my uncle's cousin". The obvious notation for expressing this is to use an addition operator, as in (2y1)+(2y2). Having done this, the next obvious step would be to look for ways to evaluate such expressions, in this case ending up with (3y2).

This takes one relationship that has been expressed with respect to someone else, and adds it to the relationship of that someone else to me, to give the first subject's relationship to me. The inverse operation, then, would also be extremely useful: to allow relationships to be translated from one person's perspective to another's. Not surprisingly, therefore, this sort of question cqn be expressed as a subtraction of the two relationships.

For example, (0y0)–(2x0) represents the question "what relation is self from my grandmother's perspective?" The answer, of course, is (0y2). In general, (0z0)–(MwN) gives (NzM), with the two integers of the other person reversed, and keeping the first person's gender (where z stands for the x or y, as appropriate). This correctly predicts that my fourth-cousin-twice-removed-senior, for example, sees me as his fourth-cousin-twice-removed-junior.

Notice, also, that (MwN)–(0z0) gives (MwN), and so is the identity operation for this operator.

Since navigating round a tree involves sequential steps, relationships lend themselves to being represented as distances.

A more adventurous question might be (2y1)–(0x1), namely, "what relation is my uncle from my daughter's perspective?" The answer is (3y1), with the second integer of "my daughter" being added to the first digit of "my uncle".

Unfortunately, the answers are not always this straightforward, especially when changing perspective to a senior generation. For example, if my daughter asked the inverse question, (3y1)–(1y0), one possible answer is, indeed, (2y1), but the complete answer is the set {2y1, 0y0s2y1}, indicating that, from her father's perspective, her great-uncle is either his uncle, or his spouse's uncle, depending on which side of the family tree the great-uncle relationship lies. In general, the size of the set doubles for each generation further up the tree.

It is next tempting to wonder about cascading these operations, as in my father's father's father, (1y0)+(1y0)+(1y0). As a convention, therefore, the association is, by default, from left to right: ((1y0)+(1y0))+(1y0). This, then, gives an obvious meaning to multiplication by an integer, such as (2y0)+7.(1y0) (which evaluates to my 7-greats grandfather, (9y0)).

At this point, it is evident that information is being lost in the addition operation. My father's father is specifically my paternal grandfather, but this evaluates merely to "my grandfather". This is why (3y1)–(1y0) ended up as needing to represent its result as a set, in an attempt to restore information that had already been lost.

The addition operator turns out to be associative, but not commutative. By default, (1y0)+(1x0)+(1y0)+(1x0) is evaluated as (((1y0)+(1x0))+(1y0))+(1x0), to give my great-grandfather's mother; but the answer would be the same if it were associated as (1y0)+((1x0)+((1y0)+(1x0))), to give my father's great-grandmother, or as ((1y0)+(1x0))+((1y0)+(1x0)) to give my grandmother's grandmother. But (1y0)+(1x0) gives my (paternal) grandmother, while (1x0)+(1y0) gives my (maternal) grandfather. However, the identity operation does still work either way: (0y0)+(7x3)=(7x3)+(0x0)=(7x3); but note that the first expression involves the speaker's self, and might be male or female, while the second expression stands for the female relation's self.

The lack of commutativity is more than just a problem of gender, though: my daughter's great-uncle, (0x1)+(3y1), is either my uncle (2y1) or my spouse's uncle (0y0s2y1), but my great-uncle's daughter, (3y1)+(0x1), is my first-cousine-once-removed-senior (3x2). The most that can be said about this is that both evaluate to a relationship in the same generation.

Finally, a first attempt can be made at mechanising the evaluation of these expressions. Starting with the subtraction operation, we can reason that (AbC)–(EfG) can evaluate to (UvW) where U=G+max(A-E,0), v=b, W=C+max(E-A,0). The reasoning is that the common ancestor of (AbC) and me is (Cy0) with respect to (AbC), and that of (EfG) and me is (Gy0) with respect to (EfG). If both of these common ancestors are in the same tree, then one is higher than the other, and the higher one is the common ancestor of both (AbC) and (EfG), and is (max(A,E)y0) with respect to me, and hence either (Cy0) or ((C+E-A)y0) with respect to (AbC), and ((G+A-E)y0) or (Gy0), respectively, with respect to (EfG).

From this, we find that the inverse-relationship operation, (0b0)–(EfG), correctly evaluates to (GbE). Then, the addition operation, (AbC)+(EfG) can be obtained from (AbC)-((0y0)-(EfG)). This says that R1+R2=R2-(self-R1), namely that R2, which is expressed as a relationship with respect to R1, can be added to R1 if I first put myself in R1's position. R1 sees me as a (self-R1) relationship, and the subtraction operation allows R2 to be expressed in this person's (that is, my) context. Thus, (AbC)+(EfG) evaluates to (UvW) where U=A+max(E-C,0), v=f, W=G+max(C-E,0).

From here, other properties of the operators can be explored. For example, that R1-R2-R3 does indeed evaluate to R1-(R2+R3), and that R1+(R2+R3)=(R1+R2)+R3 using the intermediates steps of (R3-(self-R2))-(self-R1) evaluating to the same as R3-(self-(R2-(self-R1)).

Significantly, though, the above reasoning has relied on the assumption that, "both of the common ancestors (of (AbC) and me, and of (EfG) and me) are in the same tree". If they are, they are abs(A-E) generations apart, navigating along the same path in the tree. If they are not, the evaluation becomes more complicated, and has to produce a set of results. At least, though, this page has achieved what it claimed: "Towards an Algebra for Genealogy" has only made a start, and is not yet complete. In the same way that we usually expect the functions sqrt(x), log(x) and arcsin(x) to evaluate to just one simple value (the principle value), not an exhaustive list of values, so we have at least produced an evaluation mechanism that generates the principle value when adding or subtracting two relationships.

Top of this page Home page Services Past achievements Contact Site
Page d'accueil Services Réalisations précédentes Contact
© Malcolm Shute, Valley d'Aigues Research, 2006-2010