Relational Algebra

To sum up what we have learned in the preceding two sections (about Algebraic structures and Relations):

Relational algebra consists of a number of operations
These operations are defined over relations
A relation is a set of tuples
A table in a relational database defines a relation extensionally, ie by explicitly listing tuples for which the relation holds

Example relations in JSON

To demystify all of this, let’s consider two JSON structures:

My friends

[
  { "name": "Emily", "yearsKnown": 10 },
  { "name": "Stanisław", "yearsKnown": 15 },
  { "name": "Juan", "yearsKnown": 32 },
  { "name": "Ali", "yearsKnown": 27 }
]

Your friends

[
  { "name": "Aisha", "yearsKnown": 35 },
  { "name": "Belle", "yearsKnown": 17 },
  { "name": "Stanisław", "yearsKnown": 2 }
]

We can view each of the objects as a tuple with two attributes, and each array as an extensional definition of a relation. There is another important aspect of this representation: the attributes have a name and a type. In this case, each tuple have an attribute name of type string and an attribute yearsKnown of type integer.

Union (∪)

So here we have two relations: My friends and Your friends. Now, let’s say we’re planning a party and want to invite both my friends and yours. How do we find out who to invite? By combining our friend relations, of course! To do that, we introduce our very first operation from relational algebra: Union (∪).

MyFriends ∪ YourFriends

Or in pseudocode:

union(myFriends, yourFriends)

The result is a new relation:

[
  { "name": "Emily", "yearsKnown": 10 },
  { "name": "Stanisław", "yearsKnown": 15 }
  { "name": "Juan", "yearsKnown": 32 },
  { "name": "Ali", "yearsKnown": 27 }
  { "name": "Aisha", "yearsKnown": 35 },
  { "name": "Belle", "yearsKnown": 17 },
  { "name": "Stanisław", "yearsKnown": 2 }
]

Note that Union is only possible between relations which are union-compatible, ie their tuples have the same number of attributes, with the same names (and types, also called domains).

There is a problem here, of course, because Stanisław is now included twice. As we have seen, a very important fact about relations is that they are sets, in the mathematical sense, and therefore never contain more than one instance of each tuple. But since the two Stanisław tuples are different (because they do not have the same value for yearsKnown) they are both included.

Project (π)

We would like to get rid of the yearsKnown attribute, and to do that we use another operator: Project (π), which lets us select which attributes we want from a relation.

π(name)(MyFriends ∪ YourFriends)

Pseudocode:

project(["name"], union(myFriends, yourFriends))

The output is the following relation, where every friend is included only one time.

[
  { "name": "Emily" },
  { "name": "Stanisław" }
  { "name": "Juan" },
  { "name": "Ali" }
  { "name": "Aisha" },
  { "name": "Belle" },
]

Note that we could rewrite this expression as:

π(name)(MyFriends) ∪ π(name)(YourFriends)

Pseudocode:

union(
  project(["name"], myFriends),
  project(["name"], yourFriends)
)

Here, we get a first taste of the algebraic flavor of relational algebra. Project distributes over Union in the same way that multiplication distributes over addition: a(b+c) = ab + ac.

Tables, headings and rows

Let’s now move on from JSON and settle on the standard way of talking about relations. As we have seen, a relation is a set of tuples, and those tuples must all share the same number of attributes, which must have the same names and domains. The usual way of visualizing this is as a table:

Name : String	YearsKnown : Integer
Emily	10
Stanisław	15
…	…

The heading of the table defines the names of the attributes and their domains (or types). Each row represents a tuple and each column lists one attribute of the tuples. The set of tuples itself is sometimes called the body of the relation.

Every operation of relational algebra takes one or two relations as input and produces a new relation. The new relation will sometimes have the same heading as the input relation(s), but more often it will produce a new heading. Of the examples we have seen above, the output of Project will have a different heading than the input, whereas Union preserves the headings of the input relations.

This means that Union acts only on the set of tuples of the relation, not the heading. That being the case, Union is identical to set union.

Difference (−)

What if we want to do the opposite of Union? Let’s say we know the full list of friends (AllFriends = MyFriends ∪ YourFriends) and we know who my friends are. We can then find out who your friends are using Difference (−). Like Union, Difference is identical to the set difference: given a set A and a set B, it removes from A those values (in this case tuples) that are also included in B.

YourFriends = AllFriends − MyFriends

Pseudocode:

YourFriends = minus(AllFriends, MyFriends)

YourFriends

Name	YearsKnown
Aisha	35
Belle	17
Stanisław	2

Finding Common friends

To illustrate a couple more of the fundamental operators of relational algebra, let’s say we want to answer this question: For those friends that we have in common, how long have I and you known that person?

We can see by glancing at the JSON above that Stanisław is the only friend we have in common, and the number of years we have known him is 15 (for me) and 2 (for you). But how can we use relational algebra to get the same result?

The first idea that comes to mind might be to combine the two relations (using Union) and then somehow selecting those tuples we have in common. But how would we do that? The tuples would not be identical, since the YearsKnown attribute will be different (that’s the whole point), and if they were identical, only one would show up since tuples in a relation are unique.

We need a way to combine tuples so that my attributes and your attributes end up in the same tuple. Once we have done that, we can filter out those tuples that represent common friends. Let’s see how to do that.

Rename (ρ)

The first problem is that the attributes in both relations have the same names (namely Name and YearsKnown). Since attribute names must be unique, we start by using the Rename operator, denoted ρ. ρ(X/Y) means that we rename the attribute Y to X.

MyFriends′ = ρ(Name1/Name,YearsKnown1/YearsKnown)(MyFriends)
YourFriends′ = ρ(Name2/Name,YearsKnown2/YearsKnown)(YourFriends)

Pseudocode:

MyFriends' = rename(MyFriends, Name => Name1, YearsKnown => YearsKnown1)

MyFriends’

Name1	YearsKnown1
Emily	10
Stanisław	15
…	…

Cross product (×)

Now that we have two relations with no attribute names in common, we can combine their tuples using the Cross product operation, denoted ×. This operations takes every tuple from its first operand and combines each with every tuple from the other operand. So if, for example, the operands have 5 tuples each, the resulting relation will have 25 tuples. The result will also have all attributes from the two input relations.

C = MyFriends′ × YourFriends′

Pseudocode:

C = cross_product(MyFriends', YourFriends')

Since every row of the first relation is combined with every row of the second relation, the resulting relation contains 12 (4*3) tuples:

C

Name1	YearsKnown1	Name2	YearsKnown2
Emily	10	Aisha	35
Emily	10	Belle	17
Emily	10	Stanisław	2
Stanisław	15	Aisha	35
Stanisław	15	Belle	17
…	…	…	…
Ali	27	Belle	17
Ali	27	Stanisław	2

Now that we have every possible combination of tuples, we can pick out the ones we are actually interested in, which are those in which Name1 and Name2 are identical, because those represent the friends we have in common.

Restrict (σ)

To select tuples we are interested in, we use the Restrict operation.

CommonFriends = σ(Name1 = Name2)(C)

Pseudocode:

CommonFriends = restrict(C, Name1 = Name2)

CommonFriends

Name1	YearsKnown1	Name2	YearsKnown2
Stanisław	15	Stanisław	2

And here we have our single common friend.

Project

All that remains to do is to project the two attributes we want: the amount of years each of us have known our common friend:

Years = π(YearsKnown1,YearsKnown2)(CommonFriends)

Pseudocode:

Years = project(CommonFriends, [YearsKnown1, YearsKnown2]

Years

YearsKnown1	YearsKnown2
15	2

Intersection (∩)

If we only wanted to find the names of our common friends, we could use Intersection (∩), which (like Union and Difference) is a set operation. Given relations A and B, Intersection constructs a relation containing only those tuples they have in common.

Like with Union and *Difference, the two input relations must be union-compatible, ie have identical headings, and the output relation will also have the same heading as the two inputs.

To obtain the names of our common friends, we first project only Name from MyFriends and YourFriends and then perform Intersection:

CommonFriendNames = π(name)(MyFriends) ∪ π(name)(YourFriends)

Pseudocode:

CommonFriendNames = intersection(project(MyFriends, [Name]),
                                 project(YourFriends, [Name]))

CommonFriendNames

Name
Stanisław

Minimal Relational Algebra

The operations presented up to this point form a complete relational algebra: it can be proved that any transformation that might be considered the core of the relational model can be constructed out of these operations. In fact, we could make do with just five: Restrict, Project, Cross product, Union, and Difference. Any other operations can be derived from just these five.

Nevertheless, several other operations are commonly introduced. Some are essentially convenient shortcuts for combinations of the core operations. Most important of these are Join (⋈) which we will introduce in the next section.

Other operations extend the capabilities beyond the core of relational model, most notably the Group and Summarize operations. Group is presented briefly in the section on Relations as Attributes. For more information on Summarize, consult the operation reference.