# Relational Algebra

To sum up what we have learned in the preceding two sections (about Algebraic structures and Relations):

- Relational algebra consists of a number of operations
- These operations are defined over relations
- A relation is a set of tuples
- A table in a relational database defines a relation extensionally, ie by explicitly listing tuples for which the relation holds

## Example relations in JSON

To demystify all of this, let’s consider two JSON structures:

##### My friends

##### Your friends

We can view each of the objects as a tuple with two attributes, and each array as an extensional definition of a relation. There is another important aspect of this representation: the attributes have a name and a type. In this case, each tuple have an attribute `name`

of type string and an attribute `yearsKnown`

of type integer.

### Union (∪)

So here we have two relations: **My friends** and **Your friends**. Now, let’s say we’re planning a party and want to invite both my friends and yours. How do we find out who to invite? By combining our friend relations, of course! To do that, we introduce our very first operation from relational algebra: **Union** (**∪**).

Or in pseudocode:

The result is a new relation:

Note that **Union** is only possible between relations which are *union-compatible*, ie their tuples have the same number of attributes, with the same names (and types, also called *domains*).

There is a problem here, of course, because Stanisław is now included twice. As we have seen, a very important fact about relations is that they are *sets*, in the mathematical sense, and therefore never contain more than one instance of each tuple. But since the two Stanisław tuples are different (because they do not have the same value for `yearsKnown`

) they are both included.

### Project (π)

We would like to get rid of the `yearsKnown`

attribute, and to do that we use another operator: **Project** (**π**), which lets us select which attributes we want from a relation.

Pseudocode:

The output is the following relation, where every friend is included only one time.

Note that we could rewrite this expression as:

Pseudocode:

Here, we get a first taste of the algebraic flavor of relational algebra. **Project** *distributes over* **Union** in the same way that multiplication distributes over addition: *a(b+c) = ab + ac*.

## Tables, headings and rows

Let’s now move on from JSON and settle on the standard way of talking about relations. As we have seen, a relation is a set of tuples, and those tuples must all share the same number of attributes, which must have the same names and domains. The usual way of visualizing this is as a table:

Name : String | YearsKnown : Integer |
---|---|

Emily | 10 |

Stanisław | 15 |

… | … |

The *heading* of the table defines the names of the attributes and their domains (or types). Each row represents a tuple and each column lists one attribute of the tuples. The set of tuples itself is sometimes called the *body* of the relation.

Every operation of relational algebra takes one or two relations as input and produces a new relation. The new relation will sometimes have the same heading as the input relation(s), but more often it will produce a new heading. Of the examples we have seen above, the output of **Project** will have a different heading than the input, whereas **Union** preserves the headings of the input relations.

This means that **Union** acts *only* on the set of tuples of the relation, not the heading. That being the case, **Union** is identical to **set union**.

### Difference (−)

What if we want to do the opposite of **Union**? Let’s say we know the full list of friends (`AllFriends = MyFriends ∪ YourFriends`

) and we know who my friends are. We can then find out who *your* friends are using **Difference (−)**. Like **Union**, **Difference** is identical to the **set difference**: given a set **A** and a set **B**, it removes from A those values (in this case tuples) that are also included in **B**.

Pseudocode:

##### YourFriends

Name | YearsKnown |
---|---|

Aisha | 35 |

Belle | 17 |

Stanisław | 2 |

## Finding Common friends

To illustrate a couple more of the fundamental operators of relational algebra, let’s say we want to answer this question: *For those friends that we have in common, how long have I and you known that person?*

We can see by glancing at the JSON above that Stanisław is the only friend we have in common, and the number of years we have known him is 15 (for me) and 2 (for you). But how can we use relational algebra to get the same result?

The first idea that comes to mind might be to combine the two relations (using **Union**) and then somehow selecting those tuples we have in common. But how would we do that? The tuples would not be identical, since the **YearsKnown** attribute will be different (that’s the whole point), and if they *were* identical, only one would show up since tuples in a relation are unique.

We need a way to combine tuples so that my attributes and your attributes end up in the same tuple. Once we have done that, we can filter out those tuples that represent common friends. Let’s see how to do that.

### Rename (ρ)

The first problem is that the attributes in both relations have the same names (namely **Name** and **YearsKnown**). Since attribute names must be unique, we start by using the **Rename** operator, denoted **ρ**. **ρ(X/Y)** means that we rename the attribute **Y** to **X**.

Pseudocode:

##### MyFriends’

Name1 | YearsKnown1 |
---|---|

Emily | 10 |

Stanisław | 15 |

… | … |

### Cross product (×)

Now that we have two relations with no attribute names in common, we can combine their tuples using the **Cross product** operation, denoted **×**. This operations takes every tuple from its first operand and combines each with every tuple from the other operand. So if, for example, the operands have 5 tuples each, the resulting relation will have 25 tuples. The result will also have all attributes from the two input relations.

Pseudocode:

Since every row of the first relation is combined with every row of the second relation, the resulting relation contains 12 (4*3) tuples:

##### C

Name1 | YearsKnown1 | Name2 | YearsKnown2 |
---|---|---|---|

Emily | 10 | Aisha | 35 |

Emily | 10 | Belle | 17 |

Emily | 10 | Stanisław | 2 |

Stanisław | 15 | Aisha | 35 |

Stanisław | 15 | Belle | 17 |

… | … | … | … |

Ali | 27 | Belle | 17 |

Ali | 27 | Stanisław | 2 |

Now that we have every possible combination of tuples, we can pick out the ones we are actually interested in, which are those in which **Name1** and **Name2** are identical, because those represent the friends we have in common.

### Restrict (σ)

To select tuples we are interested in, we use the **Restrict** operation.

Pseudocode:

##### CommonFriends

Name1 | YearsKnown1 | Name2 | YearsKnown2 |
---|---|---|---|

Stanisław | 15 | Stanisław | 2 |

And here we have our single common friend.

#### Project

All that remains to do is to project the two attributes we want: the amount of years each of us have known our common friend:

Pseudocode:

##### Years

YearsKnown1 | YearsKnown2 |
---|---|

15 | 2 |

### Intersection (∩)

If we only wanted to find the names of our common friends, we could use **Intersection (∩)**, which (like **Union** and **Difference**) is a set operation. Given relations **A** and **B**, **Intersection** constructs a relation containing only those tuples they have in common.

Like with **Union** and **Difference*, the two input relations must be **union-compatible**, ie have identical headings, and the output relation will also have the same heading as the two inputs.

To obtain the names of our common friends, we first project only **Name** from **MyFriends** and **YourFriends** and then perform **Intersection**:

Pseudocode:

##### CommonFriendNames

Name |
---|

Stanisław |

## Minimal Relational Algebra

The operations presented up to this point form a complete relational algebra: it can be proved that any transformation that might be considered the core of the relational model can be constructed out of these operations. In fact, we could make do with just five: **Restrict**, **Project**, **Cross product**, **Union**, and **Difference**. Any other operations can be derived from just these five.

Nevertheless, several other operations are commonly introduced. Some are essentially convenient shortcuts for combinations of the core operations. Most important of these are **Join (⋈)** which we will introduce in the next section.

Other operations extend the capabilities beyond the core of relational model, most notably the **Group** and **Summarize** operations. **Group** is presented briefly in the section on Relations as Attributes. For more information on **Summarize**, consult the operation reference.