IEEM 230 Industrial Data Systems

Assignment 5
No Due Date: Model will be posted by Dec 9

Q.1. The Employee table of our example has the following maximum attribute sizes: Fname (16bytes), Minit (1 byte), Lname (20 bytes), SSN (9 bytes), Bdate (9 bytes), Address (50 bytes), Sex (1 byte), Salary (8 bytes), SuperSSN (9 bytes) and DNo (2 bytes).

  1. Assume that 1 block = 1KB (1024 bytes), field separator= 2 bytes, record separator= 2 bytes.
    The table currently has 10,000 records. How many blocks will be required to store the table ?

    Total size of each record = 16+2 + 1+2 + 20+2 + 9+2 +9+2 + 50+2 + 1+2 + 8+2 + 9+2 +2+2 = 145 bytes.
    No of records per block = int(1024/145) = 7
    No of blocks required to store main file = 10000/7 = 1428.57, or 1429 blocks.

  2. The data is stored in an unordered file (heap file) on the hard disk. If the average seek time is 20 msec, and the average block transfer time is 1 msec, then what is the worst-case time required to find a given record? [assume that the CPU time is negligible compared to the data-access times].

    Total time to search for data in one block = 20 + 1 = 21 msec = 0.021 sec
    In the worst case, we need to look at each block; total time = 0.021 x 1429 = 30.009 or nearly 30 sec.

  3. If the data is sorted by the SSN, then calculate the worst case time required to search a record, given the SSN.

    If the data is sorted by SSN, we can perform binary search. Log2 (1429) = 10.48.
    In the worst case, 11 blocks will need to be searched, taking 11 x 0.021 = 0.231 sec.

  4. If the data is sorted by the SSN, then calculate the worst case time required to search a record, given the Lname.

    Since Lname values are not sorted, in the worst case, we will need to look at each block to find the record. This will take 1429 x 0.021 = 30.009 sec.

  5. The SSN is used to create an index file for this table. What is the worst case time spent in searching for a record given the SSN ? [assume that this is the primary index, each Block address= 4 Bytes, and the index file also has 2 byte field- and record-separators].

    Since SSN index is a primary index, only 1429 rows of data are stored.
    The format of each row (record) is SSN [field separator] Block Address [record separator]
    Total length of each record = 9+2 + 4+2 = 17 bytes
    Number of records stored in one block of Index file = int( 1024/17) = 60 records.
    Size of index file = total number of records / no of records per block = 1429/60 = 23.8, or 24 blocks.
    Search for a given SSN in index file will require looking at Log2(24) = 4.58, or 5 blocks.
    This locates the block address of record in the main file, which requires 0.021 sec to look up.
    Total (worst case) serach time = 5 x 0.021 + 0.021 = 0.126 sec.

  6. The Lname is used to create a (secondary) index file. What is the worst case time to delete a record, given the Lname value ?

    Using Lname as a secondary index requires each record with 20+2 + 4+2 = 28 bytes.
    No of records per block = int( 1024/28) = 36 records
    since this is a secondary index, we need one entry for each record of data in main file (table) = 10,000.
    Total number of blocks required to store secondary index file = 10000/36 = 277.7, or 278 blocks.
    To delete a record, we must first find the record, we must (a) Find the location(block address) of the record in main file; (b) mark this record as deleted in this record, and write this block back on hard disk.
    Worst case time to search the secondary index file for a record requires 0.021 x Log2( 278) = 0.021 x 9 = 0.189 sec.
    This gives us the location of the block in main file. To read this block, and then re-write it requires 0.021 + 0.021 sec = 0.042 sec.
    Total (worst case) time for the operation = 0.189 + 0.042 = 0.231 sec.