Perl: Practical Extraction Report language

Introduction to Arrays, File I/O, and other neat stuff

This tutorial uses examples which have been borrowed from the online manual for perl provided by Larry Wall (who is the creator of Perl). There are many online tutorials for Perl, which you can easily find using any of the common search engines such as Yahoo, Excite, etc.


In the first set fo examples, we learnt about (a) scalar variables ($x), (b) reading input from the keyboards <STDIN>, (c) conditional branches: if (test) { if-true-statements} else, (d) iterations using while (test-is-true) { statements} and until (test-is-false) {statements}, and (e) many operators, including the following:
Comparing numbers: ==, !=, >, <, >=, <=
Comparing strings: eq, ne, gt, lt, ge, le
Assignment: =
Ternary branch: (test) ? if-true : if-false;
Integer increment/decrement: $x++, $x--, ++$x, --$x;

In this series, we shall learn (a) some useful perl functions, (b) using arrays (c) I/O from and to files

Example1.pl

#!/usr/local/bin/perl

$thecount = 0;
$line = <STDIN>;
while ($line ne "") {
        chop ($line);
        @words = split(/ /, $line);
        $wordindex = 1;
        while ($wordindex <= @words) {
                if ($words[$wordindex-1] eq "the") {
                        $thecount += 1;
                }
                $wordindex++;
        }
        $line = <STDIN>;
}
print ("Total occurrences of \"the\": $thecount\n");

NOTES:

split is a perl function with two arguments. This function will break down its second argument at each point where it contains the pattern in the first argument. The pattern in the first argument is specified between two '/' characters. E.g. split( /a/, "trash") will produce two strings: "tr" and "sh". Since the result of split may have more than one strings, they must be stored in an array.
Arrays are identified by an '@' sign in front of the name. @words is an array called words. Arrays may have any number of elements. Each element must be a scalar (that is, a number, character, or character string).

If we want the 1st character of an array called "words", we refer to it as:
@words[0]
Similarly, the 3rd element of the array is: @words[2].

If you refer treat the array as a number, it's value is equal to the number of elements in the array ! Example, the test-condition for the while loop reads:
$wordindex <= @words
In which case, @words evaluates to the number of elements in array called "words" !


Example2.pl
#!/usr/local/bin/perl

$grandtotal = 0;
$line = <STDIN>;
while ($line ne "") {
        $linetotal = 0;
        @numbers = split(/ /, $line);
        $numbercount = 1;
        while ($numbercount <= @numbers) {
                $linetotal += $numbers[$numbercount-1];
                $numbercount++;
        }
        print("line total: $linetotal\n");
        $grandtotal += $linetotal;
        $line = <STDIN>;
}
print("grand total: $grandtotal\n");

NOTES: Another usage the split function.


Example3.pl
#!/usr/local/bin/perl

@lines = <STDIN>;
chop (@lines);
$longlongline = join(" ", @lines);
@words = split(/ /, $longlongline);
@words = reverse sort (@words);
$index = 0;
print("Words sorted in reverse order:\n");
while ($index < @words) {
        # note that the first time through, the following
        # comparison references $words[-1]. This is all
        # right, as $words[-1] is replaced by the null
        # string, and we want the first word to be printed
        if ($words[$index] ne $words[$index-1]) {
                print ("$words[$index]\n");
        }
        $index++;
}

NOTES:

Here we learn more functions in perl:

sort is a beautiful function, which will sort in ascending order any array of arguments, including numbers, text, whatever.
reverse is a function that returns the array, with the elements in reverse order (last to first).
join will join two all rows of an array, using the first argument as the insert between each pair of joined rows.
Also notice how you can read a set of lines insteadof one line of input.
Notice also how the function chop can take an array as an argument. In this case, it will 'chop' the last character of each element of the array.
You must have noticed also that except in the first line of your program, all text following the '#' sign is a comment.


Example4.pl
#!/usr/local/bin/perl

# collect the random numbers
$count = 1;
while ($count <= 100) {
        $randnum = int( rand(10) ) + 1;
        $randtotal[$randnum] += 1;
        $count++;
}

# print the total of each number
$count = 1;
print ("Total for each number:\n");
while ($count <= 10) {
        print ("\tnumber $count: $randtotal[$count]\n");
        $count++;
}


NOTES:

Here we see how to generate random numbers. rand will generate a random number between 0 and its argument.
Some other mathematical functions you may use are: sin, cos, atan2, sqrt, log and abs.


Example5.pl
#!/usr/local/bin/perl

@array = ("old1", "old2", "old3", "old4");
@array[1,2] = ("new2", "new3");
print ("@array\n");
print "@array ", "\n";


NOTES:

Three things to note here:
First notice how the array can be initialised by simply assigning the values of the elements;
Next, note how you can refer to one or more elements of an array using this syntax.
Notice the different ways to use the print function ! You may omit the '(' and ')' when calling the print function, even if it has many arguments.


Example6.pl
#!/usr/local/bin/perl

# read the array from standard input one item at a time
print ("Enter the array to sort, one item at a time.\n");
print ("Enter an empty line to quit.\n");
$count = 1;
$inputline = <STDIN>;
chop ($inputline);
while ($inputline ne "") {
        @array[$count-1] = $inputline;
        $count += 1;
        $inputline = <STDIN>;
        chop ($inputline);
}

# now sort the array
$count = 1;
while ($count < @array) {
        $x = 1;
        while ($x < @array) {
                if ($array[$x - 1] gt $array[$x]) {
                        @array[$x-1,$x] = @array[$x,$x-1];
                }
                $x++;
        }
        $count++;
}

# finally, print the sorted array
print ("@array\n");

NOTES:

This function is called a "bubble-sort" routine. It sorts the elements of an array, just like the sort function.


Example7.pl
#!/usr/local/bin/perl

unless (open(INFILE, "file1")) {
        die ("cannot open input file file1\n");
}
unless (open(OUTFILE, ">outfile")) {
        die ("cannot open output file outfile\n");
}
$line = <INFILE>;
while ($line ne "") {
        print OUTFILE ($line);
        $line = <INFILE>;
}

NOTES: Finally, we look at examples of how to read and write from files.
You may give any name to the files you want to write. Each file may have any name for a "handle". It is recommended to use INFILE as a handle for input files, and OUTFILE for output files. If you open many files in the same program, just use INFILE1, INFILE2 or any such names.


Example8.pl
#!/usr/local/bin/perl

@words = ("Here", "is", "a", "list.");
foreach $word (@words) {
        print ("$word\n");
}

NOTES:

I find that the foreach is a very convenient way to do iterations for members of arrays. In this example, the foreach-loop is executed one time for each element of the array @words. Each time, the variable $word gets the value of the next element of the array.

This is the end of this series of examples, and their associated notes.


Exercises

Now that we have seen so many examples of perl scripts, can we write our own examples ? Let's try a few really simple things:
Exercise. (a) Use pico to write a file (called "data") which looks like the following:
Name	Age	Hobby
=======================
John	20	Football
... add at least 5 lines, with name, age and hobby of your friends.

(b) Write a perl program that will read the file ("data"), calculate the following, and print out their values:

(i) What is the total number of people ?
(ii) What is the average age of the people ?
(iii) What is the age of the oldest person ?