Relational Algebra

What is an algebra ?

A formal system of manipulation of symbols to deal with general statements of relations.

Algebraic systems are abstract: in the sense that a system only provides a set of symbols, and set of rules to construct valid expressions, and a set of rules by which expressions and symbols can be manipulated.

The algebra we learn in high school is the algebra of real numbers. All symbols for variables represented real numbers, and all manipulations of symbols gave relations between some real numbers.

Later we learnt the Algebra of Complex numbers.. which worked pretty much the same way.

Now we learn the Algebra of relations, where every relation is represented by a relational table as defined earlier: An array of atomic values, with no ordering of tuples (rows), and strictly ordered tuple-attributes (columns).

Just as real algebra has operations (+, -, x, /), Relational Algebra also has operations.

 

The main RA operations: Select, Project, Join, Divide

 

Notice that the result of any operation in real algebra is also a real number. Thus, for real numbers x, y: (x + y) is real. So is ( x - y). So is ( x / y) for all values where ‘/’ is defined.

[Give an example where it is not defined !]

 

Similarly, in RA, whenever an operation is defined the result of an RA operation is a relational table.

 

Why is this important ?

 

[Because this allows us to combine a sequence of RA operations in arbitrary order !]

 

Why is it important to combine arbitrary sequences of RA operations ?

 

We have already seen why any given amount of data can be efficiently and intelligently stored in a set of relational tables. The relational schema, if properly designed, guarantees minimum repetition, no update anomalies, and efficient data retrieval.

 

Now we shall see that extraction/modification of ANY information in a relational schema can be done by some combination of RA operations !

 

We shall demonstrate with the following tables:

EMPLOYEE
Name ID SupervisorID DeptNo
John 111 222 5
Frankie 222 777 4
Alice 333 444 4
Jennifer 444 777 4
William 555 222 5
Joyce 666 222 5
James 777 222 5
John 888 null 1

WORKS_ON
IDno ProjNo Hours
111 1 2.5
111 2 1.5
555 3 2.5
666 1 3.5
666 2 3.5
222 2 2.5
222 3 5.5
222 1 5.5

DEPENDENT
EmpID DepName Relationship
222 Jack Son
222 Jill Wife
222 John Son
444 Ted Son
111 Mike Son
111 Anita Daughter


Select Operation:

A relational table is composed of an unordered set of tuples (rows).

The SELECT operation allows us select a SUBSET of the tuples of a relational table, which satisfy some specified conditions.

SYNTAX:

SELECT [conditions] ( TABLE )

Example:

SELECT [DeptNo = 5] ( EMPLOYEE )

OUTPUT
Name ID SupervisorID DeptNo
John 111 222 5
William 555 222 5
Joyce 666 222 5
James 777 222 5

 

All tuples in relational table EMPLOYEE, for which the condition [DeptNo = 5] was TRUE, were placed in the OUTPUT.

NOTES:

1. How does SELECT work ?

It looks at each tuple of the table, and evaluates the specified conditions. If the conditions are true, then that tuple is placed in OUTPUT. Otherwise it is rejected.

2. The result of the SELECT operation is also a Relational table !

In fact, it is a subset of all the tuples of the table EMPLOYEE.

3. Selection conditions MUST always evaluate to LOGICAL values: TRUE, or FALSE. Hence all conditions are EXPRESSIONS, connected by LOGICAL Operators (AND, OR, NOT).

Example:

SELECT [ (DeptNo != 4) AND ( (ID = 222) OR (ID = 111)) ] ( EMPLOYEE)

OUTPUT
Name ID SupervisorID DeptNo
John 111 222 5

Project Operation:

The SELECT operation outputs a subset of the rows of a Relational Table. In contrast, the PROJECT operation outputs a subset of the columns of the Relational Table.

SYNTAX:

PROJECT [attribute-list] ( TABLE )

Example:

PROJECT [ Name, ID] ( EMPLOYEE )

OUTPUT
Name ID
John 111
Frankie 222
Alice 333
Jennifer 444
William 555
Joyce 666
James 777
John 888

NOTES:

1. Once again, the result of the PROJECT operation is another Relational Table.

2. Since Relational tables are a set of tuples, if PROJECT will not output identical tuples twice !

Example:

PROJECT [EmpID, Relationship] ( DEPENDENT )

OUTPUT
EmpID Relationship
222 Son
222 Wife
444 Son
111 Son
111 Daughter

Of course, since you can have any combination of RA Operations, the following example is totally valid:

PROJECT [Name, ID] ( SELECT [ SupervisorID = 222] ( EMPLOYEE) )

Which gets evaluated in two steps: First, the SELECT returns OUTPUT1, and then, the PROJECT returns the final OUTPUT.

Step 1:

OUTPUT1
Name ID SupervisorID DeptNo
John 111 222 5
William 555 222 5
Joyce 666 222 5
James 777 222 5

and then Step 2:

OUTPUT2
Name ID
John 111
William 555
Joyce 666
James 777

Join Operations:

As we have seen, the entire process of Normalization broke down all our data into many small tables. Of course, very often we need to get information from not one, but a combination of two or more tables. RA allows us to do so by providing an operator called JOIN. This operator combines the information in two Relational Tables. In particular, we look at the operation THETA-JOIN.

SYNTAX:

THETA-JOIN [conditions] ( TABLE1, TABLE2)

In the above, TABLE1 and TABLE2 can possibly be the same table.

THETA-JOIN forms combinations of the tables TABLE1 and TABLE2. The output is another table, with all the attributes of TABLE1 and all attributes of TABLE2.

How it works:

  1. Since Relational Tables are sets of tuples, first compute the CARTESIAN PRODUCT of the tables.
    If TABLE1 has A rows, and B columns; TABLE2 has N rows and M columns, then:
    Each tuple in the Cartesian product will have ( B + M) attributes.
    The Cartesian product will have A*N rows.
  2. For each row in the cartesian product, check if [conditions] evaluetes to TRUE or FALSE. If [conditions] are TRUE, add the tuple to OUTPUT.

Example:

THETA-JOIN [ID = IDno] ( EMPLOYEE, WORKS_ON )

 

OUTPUT
Name ID SupervisorID DeptNo IDno ProjNo Hours
John 111 222 5 111 1 2.5
John 111 222 5 111 2 1.5
Frankie 222 777 4 222 2 2.5
Frankie 222 777 4 222 3 5.5
Frankie 222 777 4 222 1 5.5
William 555 222 5 555 3 2.5
Joyce 666 222 5 666 1 3.5
Joyce 666 222 5 666 2 3.5

NOTES:

Both, THETA-Join and NATURAL-JOIN operations are called INNER-JOIN operations. These have one common property:

If there is no tuple from TABLE2 matching the conditions with a tuple of TABLE1, then that tuple of TABLE1 does not occur in the OUTPUT.

Sometimes, when performing JOIN operations, we may specifically require that all tuples of TABLE1 must occur in the OUTPUT. If no matching tuples are found in TABLE2, then just enter NULL values for the attributes related to TABLE2. This operation is called a LEFT-OUTER-JOIN.

Example:

LEFT-OUTER-JOIN [ID = EmpID] ( EMPLOYEE, DEPENDENT )

OUTPUT
Name ID SupervisorID DeptNo EmpID DepName Relationship
John 111 222 5 111 Mike Son
John 111 222 5 111 Anita Daughter
Frankie 222 777 4 222 Jack Son
Frankie 222 777 4 222 Jill Wife
Frankie 222 777 4 222 John Son
Alice 333 444 4 null null null
Jennifer 444 777 4 444 Ted Son
William 555 222 5 null null null
Joyce 666 222 5 null null null
James 777 222 5 null null null
John 888 null 1 null null null

Similarly, we can define a RIGHT-OUTER-JOIN, where all tuples of TABLE2 appear at least once, with null values for tuples of TABLE1 when there is no match.


Set theoretic Operations: Union, Intersection, Difference, Division.

Since relational tables are sets of tuples, you are allowed to do any set-theoretic operation on them.

Union:

If two tables have the same attributes, you can perform a union of the two.

SYNTAX:

UNION ( TABLE1, TABLE2)

Example:

UNION ( (SELECT [EmpID = 222] (DEPENDENTS)), ( SELECT [EmpID = 444] ( DEPENDENTS) ) )

OUTPUT
EmpID DepName Relationship
222 Jack Son
222 Jill Wife
222 John Son
444 Ted Son

Intersection:

Performs set intersection on the tables.

SYNTAX:

INTERSECTION ( TABLE1, TABLE2)

Example:

INTERSECTION (( (SELECT [EmpID = 222] (DEPENDENTS)), ( SELECT [Relationship = SON] ( DEPENDENTS) ) )

OUTPUT
EmpID DepName Relationship
222 Jack Son
222 John Son

Difference:

Performs set difference (every element of first set which is NOT a member of the second set is output).

SYNTAX:

DIFFERENCE ( TABLE1, TABLE2)

Example:

DIFFERENCE (( (SELECT [EmpID = 222] (DEPENDENTS)), ( SELECT [Relationship = SON] ( DEPENDENTS) ) )

OUTPUT
EmpID DepName Relationship
222 Jill Wife

NOTE:

DIFFERENCE ( TABLE1, TABLE2) is NOT the SAME as

DIFFERENCE( TABLE2, TABLE1).


DivideBy:

You may also perform a set division operation on two tables. The operation is described using the simple tables and example that follow.

SYNTAX:

DIVIDEBY ( TABLE1, TABLE2)

The result is defined as follows:

  1. Attributes of TABLE2 must be a proper subset of attributes of TABLE1. If not, the result is NULL.

  2. We divide the attributes of TABLE1 into two subsets: T1unique, T1commonT2. T1unique has those attributes not shared by TABLE2, and T1commonT2 are all attributes which are in TABLE2.

  3. The attributes in the DIVIDEBY will be exactly the same as T1unique.

  4. For each unique value of T1unique in TABLE1, check if there is a tuple corrsponding to each tuple of TABLE2, in T1commonT2. If so, this value of T1unique will appear in the result.

  5. Repeat this check for each unique value of T1unique in TABLE1.

This operation is best explained with a simple example:

Problem: Find the ID of Employees who work on ALL the projects.

You can try to use the other RA functions, and will soon discover that this query is not easy to solve. The best way to answer such For-ALL queries is the use of the DIVIDEBY operation, as follows.

We first find a table listing 'ALL the Projects':

PROJECTS = PROJECT[ ProjNo] ( WORKS_ON)

PROJECTS
ProjNo
1
2
3
Next, we find which employee is working on what projects:

E_WORKS_ON = PROJECT[ ID, ProjNo]( WORKS_ON)

E_WORKS_ON
ID ProjNo
111 1
111 2
222 2
222 3
222 1
555 3
666 1
666 2

 

Now, we can find the answer we want by using the following query:

DIVIDEBY ( E_WORKS_ON, PROJECTS); The result is computed in the following way:

1. We note that T1unique = {ID}, T1commonT2 = {ProjNo}

2. The result therefore must be a single column table, with one attribute (ID).

3. Take the first unique value of T1unique: ID = 111.

4. For each value of ProjNo in PROJECTS, do we have a tuple in E_WORKS_ON corresponding to ID = 111 ?

NO: because there is no tuple in E_WORKS_ON with ID = 111, and ProjNo = 3.

Therefore, ID = 111 does not appear in the result.

5. Repeat step 4 for next unique value of T1unique, that is, Id = 222.

In this case, the test is true, and therefore ID = 222 appears in the result.

6. Repeat step 4 noxt unique value of T1unique, ID = 555.

The test again fails, and therefore ID = 555 is not in the result.

7. Similarly, the test fails for ID = 666; thus, ID = 666 will not appear in the output.

8. Since there are no more unique values of T1unique left, we are done, and the result is the following table:

RESULT
ID
222

This concludes our coverage of all important Relational Algebra operations. It can be shown (though we shall not do so) that RA is complete, in the sense that any modification you wold like to do on any subset of a set of relational tables can be done using some combination of RA commands. It is, as we have seen, an elegant mathematical tool. Nortice that RA is procedural -- an RA expression indicates the sequence of performing the operations, and therefore imposes some constraint on the use of smart algorithms to perhaps improve the performance of some queries. It is easy to construct two queries that will yield the same result, but require different amount of effort to compute.

One language that can logically describe the output of a query, without imposing any restrictions on the procedure that must be used to compute the output is Relational Calculus (RC). RC is equivalent to RA, in the sense that any output that can be expressed in RA can also be expressed in RC, and vice versa. In the following lectures, we shall study the most popular of all DML's for relational DBMS's: Structutred Query Language (SQL). SQL is based on relational calculus.