What is an algebra ?
A formal system of manipulation of symbols to deal with general statements of relations.
Algebraic systems are abstract: in the sense that a system only provides a set of symbols, and set of rules to construct valid expressions, and a set of rules by which expressions and symbols can be manipulated.
The algebra we learn in high school is the algebra of real numbers. All symbols for variables represented real numbers, and all manipulations of symbols gave relations between some real numbers.
Later we learnt the Algebra of Complex numbers.. which worked pretty much the same way.
Now we learn the Algebra of relations, where every relation is represented by a relational table as defined earlier: An array of atomic values, with no ordering of tuples (rows), and strictly ordered tuple-attributes (columns).
Just as real algebra has operations (+, -, x, /), Relational Algebra also has operations.
Notice that the result of any operation in real algebra is also a real number. Thus, for real numbers x, y: (x + y) is real. So is ( x - y). So is ( x / y) for all values where ‘/’ is defined.
[Give an example where it is not defined !]
Similarly, in RA, whenever an operation is defined the result of an RA operation is a relational table.
Why is this important ?
[Because this allows us to combine a sequence of RA operations in arbitrary order !]
Why is it important to combine arbitrary sequences of RA operations ?
We have already seen why any given amount of data can be efficiently and intelligently stored in a set of relational tables. The relational schema, if properly designed, guarantees minimum repetition, no update anomalies, and efficient data retrieval.
Now we shall see that extraction/modification of ANY information in a relational schema can be done by some combination of RA operations !
We shall demonstrate with the following tables:
| EMPLOYEE | |||
|---|---|---|---|
| Name | ID | SupervisorID | DeptNo |
| John | 111 | 222 | 5 |
| Frankie | 222 | 777 | 4 |
| Alice | 333 | 444 | 4 |
| Jennifer | 444 | 777 | 4 |
| William | 555 | 222 | 5 |
| Joyce | 666 | 222 | 5 |
| James | 777 | 222 | 5 |
| John | 888 | null | 1 |
| WORKS_ON | ||
|---|---|---|
| IDno | ProjNo | Hours |
| 111 | 1 | 2.5 |
| 111 | 2 | 1.5 |
| 555 | 3 | 2.5 |
| 666 | 1 | 3.5 |
| 666 | 2 | 3.5 |
| 222 | 2 | 2.5 |
| 222 | 3 | 5.5 |
| 222 | 1 | 5.5 |
| DEPENDENT | ||
|---|---|---|
| EmpID | DepName | Relationship |
| 222 | Jack | Son |
| 222 | Jill | Wife |
| 222 | John | Son |
| 444 | Ted | Son |
| 111 | Mike | Son |
| 111 | Anita | Daughter |
A relational table is composed of an unordered set of tuples (rows).
The SELECT operation allows us select a SUBSET of the tuples of a relational table, which satisfy some specified conditions.
SYNTAX:
SELECT [conditions] ( TABLE )
Example:
SELECT [DeptNo = 5] ( EMPLOYEE )
| OUTPUT | |||
|---|---|---|---|
| Name | ID | SupervisorID | DeptNo |
| John | 111 | 222 | 5 |
| William | 555 | 222 | 5 |
| Joyce | 666 | 222 | 5 |
| James | 777 | 222 | 5 |
All tuples in relational table EMPLOYEE, for which the condition [DeptNo = 5] was TRUE, were placed in the OUTPUT.
NOTES:
1. How does SELECT work ?
It looks at each tuple of the table, and evaluates the specified conditions. If the conditions are true, then that tuple is placed in OUTPUT. Otherwise it is rejected.
2. The result of the SELECT operation is also a Relational table !
In fact, it is a subset of all the tuples of the table EMPLOYEE.
3. Selection conditions MUST always evaluate to LOGICAL values: TRUE, or FALSE. Hence all conditions are EXPRESSIONS, connected by LOGICAL Operators (AND, OR, NOT).
Example:
SELECT [ (DeptNo != 4) AND ( (ID = 222) OR (ID = 111)) ] ( EMPLOYEE)
| OUTPUT | |||
|---|---|---|---|
| Name | ID | SupervisorID | DeptNo |
| John | 111 | 222 | 5 |
The SELECT operation outputs a subset of the rows of a Relational Table. In contrast, the PROJECT operation outputs a subset of the columns of the Relational Table.
SYNTAX:
PROJECT [attribute-list] ( TABLE )
Example:
PROJECT [ Name, ID] ( EMPLOYEE )
| OUTPUT | |
|---|---|
| Name | ID |
| John | 111 |
| Frankie | 222 |
| Alice | 333 |
| Jennifer | 444 |
| William | 555 |
| Joyce | 666 |
| James | 777 |
| John | 888 |
NOTES:
1. Once again, the result of the PROJECT operation is another Relational Table.
2. Since Relational tables are a set of tuples, if PROJECT will not output identical tuples twice !
Example:
PROJECT [EmpID, Relationship] ( DEPENDENT )
| OUTPUT | |
|---|---|
| EmpID | Relationship |
| 222 | Son |
| 222 | Wife |
| 444 | Son |
| 111 | Son |
| 111 | Daughter |
Of course, since you can have any combination of RA Operations, the following example is totally valid:
PROJECT [Name, ID] ( SELECT [ SupervisorID = 222] ( EMPLOYEE) )
Which gets evaluated in two steps: First, the SELECT returns OUTPUT1, and then, the PROJECT returns the final OUTPUT.
Step 1:
| OUTPUT1 | |||
|---|---|---|---|
| Name | ID | SupervisorID | DeptNo |
| John | 111 | 222 | 5 |
| William | 555 | 222 | 5 |
| Joyce | 666 | 222 | 5 |
| James | 777 | 222 | 5 |
and then Step 2:
| OUTPUT2 | |
|---|---|
| Name | ID |
| John | 111 |
| William | 555 |
| Joyce | 666 |
| James | 777 |
As we have seen, the entire process of Normalization broke down all our data into many small tables. Of course, very often we need to get information from not one, but a combination of two or more tables. RA allows us to do so by providing an operator called JOIN. This operator combines the information in two Relational Tables. In particular, we look at the operation THETA-JOIN.
SYNTAX:
THETA-JOIN [conditions] ( TABLE1, TABLE2)
In the above, TABLE1 and TABLE2 can possibly be the same table.
THETA-JOIN forms combinations of the tables TABLE1 and TABLE2. The output is another table, with all the attributes of TABLE1 and all attributes of TABLE2.
How it works:
Example:
THETA-JOIN [ID = IDno] ( EMPLOYEE, WORKS_ON )
| OUTPUT | ||||||
|---|---|---|---|---|---|---|
| Name | ID | SupervisorID | DeptNo | IDno | ProjNo | Hours |
| John | 111 | 222 | 5 | 111 | 1 | 2.5 |
| John | 111 | 222 | 5 | 111 | 2 | 1.5 |
| Frankie | 222 | 777 | 4 | 222 | 2 | 2.5 |
| Frankie | 222 | 777 | 4 | 222 | 3 | 5.5 |
| Frankie | 222 | 777 | 4 | 222 | 1 | 5.5 |
| William | 555 | 222 | 5 | 555 | 3 | 2.5 |
| Joyce | 666 | 222 | 5 | 666 | 1 | 3.5 |
| Joyce | 666 | 222 | 5 | 666 | 2 | 3.5 |
NOTES:
Thus, the OUTPUT should actually have attributes called: EMPLOYEE.Name, EMPLOYEE.ID, ..., WORKS_ON.IDno, ..., WORKS_ON.Hours.
For each tuple, t1i, of TABLE1 {
For each tuple, t2j, in TABLE2 {
if [conditions] are TRUE, add the combined tuple t1it2j to OUTPUT ; }}
Both, THETA-Join and NATURAL-JOIN operations are called INNER-JOIN operations. These have one common property:
If there is no tuple from TABLE2 matching the conditions with a tuple of TABLE1, then that tuple of TABLE1 does not occur in the OUTPUT.
Sometimes, when performing JOIN operations, we may specifically require that all tuples of TABLE1 must occur in the OUTPUT. If no matching tuples are found in TABLE2, then just enter NULL values for the attributes related to TABLE2. This operation is called a LEFT-OUTER-JOIN.
Example:
LEFT-OUTER-JOIN [ID = EmpID] ( EMPLOYEE, DEPENDENT )
| OUTPUT | ||||||
|---|---|---|---|---|---|---|
| Name | ID | SupervisorID | DeptNo | EmpID | DepName | Relationship |
| John | 111 | 222 | 5 | 111 | Mike | Son |
| John | 111 | 222 | 5 | 111 | Anita | Daughter |
| Frankie | 222 | 777 | 4 | 222 | Jack | Son |
| Frankie | 222 | 777 | 4 | 222 | Jill | Wife |
| Frankie | 222 | 777 | 4 | 222 | John | Son |
| Alice | 333 | 444 | 4 | null | null | null |
| Jennifer | 444 | 777 | 4 | 444 | Ted | Son |
| William | 555 | 222 | 5 | null | null | null |
| Joyce | 666 | 222 | 5 | null | null | null |
| James | 777 | 222 | 5 | null | null | null |
| John | 888 | null | 1 | null | null | null |
Similarly, we can define a RIGHT-OUTER-JOIN, where all tuples of TABLE2 appear at least once, with null values for tuples of TABLE1 when there is no match.
Since relational tables are sets of tuples, you are allowed to do any set-theoretic operation on them.
If two tables have the same attributes, you can perform a union of the two.
SYNTAX:
UNION ( TABLE1, TABLE2)
Example:
UNION ( (SELECT [EmpID = 222] (DEPENDENTS)), ( SELECT [EmpID = 444] ( DEPENDENTS) ) )
| OUTPUT | ||
|---|---|---|
| EmpID | DepName | Relationship |
| 222 | Jack | Son |
| 222 | Jill | Wife |
| 222 | John | Son |
| 444 | Ted | Son |
Performs set intersection on the tables.
SYNTAX:
INTERSECTION ( TABLE1, TABLE2)
Example:
INTERSECTION (( (SELECT [EmpID = 222] (DEPENDENTS)), ( SELECT [Relationship = SON] ( DEPENDENTS) ) )
| OUTPUT | ||
|---|---|---|
| EmpID | DepName | Relationship |
| 222 | Jack | Son |
| 222 | John | Son |
Performs set difference (every element of first set which is NOT a member of the second set is output).
SYNTAX:
DIFFERENCE ( TABLE1, TABLE2)
Example:
DIFFERENCE (( (SELECT [EmpID = 222] (DEPENDENTS)), ( SELECT [Relationship = SON] ( DEPENDENTS) ) )
| OUTPUT | ||
|---|---|---|
| EmpID | DepName | Relationship |
| 222 | Jill | Wife |
NOTE:
DIFFERENCE ( TABLE1, TABLE2) is NOT the SAME as
DIFFERENCE( TABLE2, TABLE1).
You may also perform a set division operation on two tables. The operation is described using the simple tables and example that follow.
SYNTAX:
DIVIDEBY ( TABLE1, TABLE2)
The result is defined as follows:
Problem: Find the ID of Employees who work on ALL the projects.
You can try to use the other RA functions, and will soon discover that this query is not easy to solve. The best way to answer such For-ALL queries is the use of the DIVIDEBY operation, as follows.
We first find a table listing 'ALL the Projects':
PROJECTS = PROJECT[ ProjNo] ( WORKS_ON)
| PROJECTS |
| ProjNo |
| 1 |
| 2 |
| 3 |
E_WORKS_ON = PROJECT[ ID, ProjNo]( WORKS_ON)
| E_WORKS_ON | |
|---|---|
| ID | ProjNo |
| 111 | 1 |
| 111 | 2 |
| 222 | 2 |
| 222 | 3 |
| 222 | 1 |
| 555 | 3 |
| 666 | 1 |
| 666 | 2 |
Now, we can find the answer we want by using the following query:
DIVIDEBY ( E_WORKS_ON, PROJECTS); The result is computed in the following way:
1. We note that T1unique = {ID}, T1commonT2 = {ProjNo}
2. The result therefore must be a single column table, with one attribute (ID).
3. Take the first unique value of T1unique: ID = 111.
4. For each value of ProjNo in PROJECTS, do we have a tuple in E_WORKS_ON corresponding to ID = 111 ?
NO: because there is no tuple in E_WORKS_ON with ID = 111, and ProjNo = 3.
Therefore, ID = 111 does not appear in the result.
5. Repeat step 4 for next unique value of T1unique, that is, Id = 222.
In this case, the test is true, and therefore ID = 222 appears in the result.
6. Repeat step 4 noxt unique value of T1unique, ID = 555.
The test again fails, and therefore ID = 555 is not in the result.
7. Similarly, the test fails for ID = 666; thus, ID = 666 will not appear in the output.
8. Since there are no more unique values of T1unique left, we are done, and the result is the following table:
| RESULT |
| ID |
| 222 |
This concludes our coverage of all important Relational Algebra operations. It can be shown (though we shall not do so) that RA is complete, in the sense that any modification you wold like to do on any subset of a set of relational tables can be done using some combination of RA commands. It is, as we have seen, an elegant mathematical tool. Nortice that RA is procedural -- an RA expression indicates the sequence of performing the operations, and therefore imposes some constraint on the use of smart algorithms to perhaps improve the performance of some queries. It is easy to construct two queries that will yield the same result, but require different amount of effort to compute.
One language that can logically describe the output of a query, without imposing any restrictions on the procedure that must be used to compute the output is Relational Calculus (RC). RC is equivalent to RA, in the sense that any output that can be expressed in RA can also be expressed in RC, and vice versa. In the following lectures, we shall study the most popular of all DML's for relational DBMS's: Structutred Query Language (SQL). SQL is based on relational calculus.