Still more Perl and CGI scripts

Lecture 5: Still more Perl and CGI scripts

Today I'll discuss the following: Pages on today's topics:

Manipulating web documents from a unix script

A package of routines called libwww-perl, also called LWP, runs with perl5 to make it very easy to access web documents directly, and to do whatever you like with the document that is returned.

For instance, you can print the document, or store the document in a file on your local disk. (The latter is particularly helpful for binary files.)

Tutorial documentation on this package (brief) can be found at http://perl.com/perl/wwwman/libwww/lwpcook.html

More complete information on LWP, and other perl packages for the web, can be found at http://perl.com/perl/wwwman/index.html

Getting LWP

Ask your system administrator to install LWP along with perl5 (version 5.002 is the latest) onto your system.

Perl software, including LWP and many other modules, can be downloaded from here.

If your sys-admin doesn't want to install some module in system directories, or if you just want to try it out without bothering him or her, you can install the module into your own directory, for instance

/home/mtn/my-home-directory/my-perl-library
Then, in your scripts, include the line
use lib "/home/mtn/my-home-directory/my-perl-library"
You can, I believe, include as many libs as you want this way.

Getting a document from the web

Once the appropriate libraries have been installed, the code to get a document from the web is very simple.

Here is an example of perl code you could run directly from Unix (for instance with a cron job), to grab a file from anywhere on the web:

#!/usr/local/perl5/bin/perl -wT

use LWP::Simple;

$url = "http://www.somewhere.gov/path/file-I-want.html";
$local_file = "xyz.txt";

getstore $url,$local_file; #store as a local file
#OR...
getprint $url;  #print it out
Here's a perl program that actually does this.

Let's try calling get_doc.pl from unix.

A better way to set the Expires: tag

Now that we are using LWP, we have a more convinent way of setting the Expires: tag.

#get seconds-since-1970
$current_time = time();

#load the Date portion of the LWP library	
use HTTP::Date;  

#make a nicely-formatted time
$stringGMT = time2str($current_time);

print <<"EOI";
Expires: $stringGMT
Content-type: text/html

[put your HTML stuff here!]
EOI
The HTTP:Date library also includes the function str2time which converts the other way (i.e., from an ascii string (in most reasonable formats) into seconds-since-1970.)

Maintaining state

Sometimes you want to have an extended dialog with the user. Say he or she provides some input, and based on that input you want to ask some more questions. This can be troublesome over the web because HTTP is a stateless protocol. That is, each contact between the browser and the server is unique--there's no history (state).

There are several ways around this, and most involve sending state information back to the user in a way that it is re-sent to the server in the next communication.

For example, consider this multi-stage process:

More on maintaining state

We take advantage of the <hidden> tag in the second form that we send back to the user in response to the first form.

Note: hidden data isn't really hidden, the user can use view source to see it. But it doesn't clutter up the user's screen.

The most interesting aspect of this multi-part dialog is the script that responds to the first form, and creates the second.

Here is how it works.

(If you have a lot of information to save between calls to the server, it may be simpler to put it in a local file--with a unique file name--on the server computer, and simply send the file name back to the browser as a hidden variable.

On-the-fly graphics

(The book CGI Programming has a nice section on graphics for the web.)

Since we now know how to run processes from perl, on-the-fly graphics are simple.

You simply run a process that generates a .gif document, and create some html code that has a link to that .gif.

For example:

$plot_file = "/tmp/$$my_plot.gif";
open(GRAPH, "|/disk/my-graphics-program > $plot_file");
print GRAPH <<"EOI"
(commands for the graphics program go here)
...
EOI
Then, you create some html document that has this html tag in it:
<img src="$plot_file">
The newly-created plot will appear at that place in the html document.

In this way, you can use any graphics package, for instance IDL, AVS, postscript, NCAR graphics, or anything else.

More on-the-fly graphics

Alternatively, you can have your CGI script output graphics directly to standard output. You need no intermediate .gif file in this case

This link to <a href="clock2.cgi"> generates graphics this way.

If you wanted to have the image appear directly in an html document, you'd include this in your html code:

<img src="clock2.cgi">

Generating graphics in perl

It is often most convenient, and generally faster, to use a graphics package that is more tightly linked to perl, and doesn't require using open to start a separate process.

Two interesting perl graphics packages are

A look at gd

To run gd, you need to have your system administrator install it, or you can install it yourself into a local directory.

gd is currently installed in the web101/utilities directory.

Here is the code that generated the graphic we saw earlier. Here, again, is its output.

A more complex application (that probably would be better done in pgperl) is this start at plotting mesonet data.

Here is the script that does the work.

Homework: make the output nicer, so we can really generate mesonet time series plots from perl, and extend the script so that any variable can be plotted.

Here's an example of the kind of image more typically done with gd.

Warning: Sometimes on-the-fly graphics are not what you want. If you expect to generate the same graphic multiple times, it may be more efficient to generate the graphic once and store the gif image. Then it can be accessed multiple times with minimal CPU load on your server.

Restricting access to your page

Most servers allow you to restrict access to particular directories by the name of the remote host, and/or by a user name and password. You need to get your web master to set this up for you.

But sometimes you want to restrict access to a particular cgi script. This is easy to do in perl.

Here is an example of some perl code that restricts access to particular domain names. (It is also a good example of using regular expressions.):

#get return address (name or number)
$returnAddress = ($ENV{'REMOTE_HOST'} || 
                  $ENV{'REMOTE_ADDR'} ||
                  "(Unknown requestor)");

#match the regular expression
#(Be sure to escape the dots!)
unless ($returnAddress =~ m|fsl\.noaa\.gov|) {
    print "Content-type: text/html\n\n" .
          "<h1>$returnAddress not allowed access.</h1>";
    die;
}
... only acceptable users get this far

Bypassing the server

Sometimes you'd like your cgi script to talk directly to the browser, bypassing the server.

Recall how the communication links usually work:

'nph-' is the prefix you need

If your cgi script's name starts with nph-, its output will go directly back to the client, bypassing the server.

Why would you want to do this?

Consider a perl script that counts from 1 to 7, taking 1 second per count.

This is a script that does that, and here is its output.

You see, the server buffered all the output and sent it back in one chunk.

On the other hand, here is the output of the same script, but renamed nph-count.cgi

(We sent the status header because the server doesn't add this to the output from nph- scripts.)

Letting go of the browser

A perhaps more important reason to use nph- occurs when you want your cgi script to If you don't use nph- the server will (apparently) try to keep the connection with the browser open until your cgi script and all its sub-process of have completed. And if the person reading the browser presses the 'stop' button, the server will try to terminate your script and all its sub-processes (or rather it should!).

Here's a cgi script that starts a long job

This cgi script announces to the user that it is starting a long job. Typically, such a script would accept user input from a form.

After the script makes its announcement, it uses open to start a long job, then exits.

Here is the cgi script, and here is the script that it calls.

We execute this by calling unhook.cgi.

You see the problem: the server waits until the long job is completed before it sends anything back to us.

The way around this is to use nph-unhook.cgi.

The End

So that's the course. I hope you've learned something useful.

The class web server will stay up for two months more only, so be sure to move anything you want to keep to another place.

My lecture notes will remain, somewhere. I'll put a link on the internal documents page of the FSL homepage, so FSL people will be able to access them.

There wasn't much interest in another O'Reilly order, so I doubt that I'll send one in, unless some interest materializes in the next week or so.

We've obviously just scratched the surface. But if you put some time into studying Learning Perl, other books, and on-line documentation, you'll find that programming for the web isn't too hard, and can b a lot of fun.

If you have any questions about the web, cgi scripts, and perl, I'm happy to share what I know.