libwww-perl, also called
LWP, runs with perl5
to make it very easy to access web documents directly, and to do
whatever you like with the document that is returned.For instance, you can print the document, or store the document in a file on your local disk. (The latter is particularly helpful for binary files.)
Tutorial documentation on this package (brief) can be found at http://perl.com/perl/wwwman/libwww/lwpcook.html
More complete information on LWP, and other perl packages for the web, can be found at http://perl.com/perl/wwwman/index.html
Perl software, including LWP and many other modules, can be downloaded from here.
If your sys-admin doesn't want to install some module in system directories, or if you just want to try it out without bothering him or her, you can install the module into your own directory, for instance
/home/mtn/my-home-directory/my-perl-libraryThen, in your scripts, include the line
use lib "/home/mtn/my-home-directory/my-perl-library"You can, I believe, include as many libs as you want this way.
Here is an example of perl code you could run directly from Unix (for instance with a cron job), to grab a file from anywhere on the web:
#!/usr/local/perl5/bin/perl -wT use LWP::Simple; $url = "http://www.somewhere.gov/path/file-I-want.html"; $local_file = "xyz.txt"; getstore $url,$local_file; #store as a local file #OR... getprint $url; #print it outHere's a perl program that actually does this.
Let's try calling get_doc.pl from unix.
#get seconds-since-1970 $current_time = time(); #load the Date portion of the LWP library use HTTP::Date; #make a nicely-formatted time $stringGMT = time2str($current_time); print <<"EOI"; Expires: $stringGMT Content-type: text/html [put your HTML stuff here!] EOIThe
HTTP:Date library also includes the function
str2time which converts the other way (i.e., from an
ascii string (in most reasonable formats) into seconds-since-1970.)
There are several ways around this, and most involve sending state information back to the user in a way that it is re-sent to the server in the next communication.
For example, consider this multi-stage process:
<hidden> tag in the second
form that we send back to the user in response to the first form.Note: hidden data isn't really hidden, the user can use view source to see it. But it doesn't clutter up the user's screen.
The most interesting aspect of this multi-part dialog is the script that responds to the first form, and creates the second.
Here is how it works.
(If you have a lot of information to save between calls to the server, it may be simpler to put it in a local file--with a unique file name--on the server computer, and simply send the file name back to the browser as a hidden variable.
Since we now know how to run processes from perl, on-the-fly graphics are simple.
You simply run a process that generates a .gif document, and create some html code that has a link to that .gif.
For example:
$plot_file = "/tmp/$$my_plot.gif"; open(GRAPH, "|/disk/my-graphics-program > $plot_file"); print GRAPH <<"EOI" (commands for the graphics program go here) ... EOIThen, you create some html document that has this html tag in it:
<img src="$plot_file">The newly-created plot will appear at that place in the html document.
In this way, you can use any graphics package, for instance IDL, AVS, postscript, NCAR graphics, or anything else.
This link to <a href="clock2.cgi"> generates graphics this way.
If you wanted to have the image appear directly in an html document, you'd include this in your html code:
<img src="clock2.cgi">
open to start a separate process.Two interesting perl graphics packages are
gd is currently installed in the web101/utilities directory.
Here is the code that generated the graphic we saw earlier. Here, again, is its output.
A more complex application (that probably would be better done in pgperl) is this start at plotting mesonet data.
Here is the script that does the work.
Homework: make the output nicer, so we can really generate mesonet time series plots from perl, and extend the script so that any variable can be plotted.
Here's an example of the kind of image more typically done with gd.
Warning: Sometimes on-the-fly graphics are not what you want. If you expect to generate the same graphic multiple times, it may be more efficient to generate the graphic once and store the gif image. Then it can be accessed multiple times with minimal CPU load on your server.
But sometimes you want to restrict access to a particular cgi script. This is easy to do in perl.
Here is an example of some perl code that restricts access to particular domain names. (It is also a good example of using regular expressions.):
#get return address (name or number)
$returnAddress = ($ENV{'REMOTE_HOST'} ||
$ENV{'REMOTE_ADDR'} ||
"(Unknown requestor)");
#match the regular expression
#(Be sure to escape the dots!)
unless ($returnAddress =~ m|fsl\.noaa\.gov|) {
print "Content-type: text/html\n\n" .
"<h1>$returnAddress not allowed access.</h1>";
die;
}
... only acceptable users get this far
Recall how the communication links usually work:
nph-, its output
will go directly back to the client, bypassing the server.Why would you want to do this?
This is a script that does that, and here is its output.
You see, the server buffered all the output and sent it back in one chunk.
On the other hand, here is the output of the same script, but renamed nph-count.cgi
(We sent the status header because the server doesn't add this to the output from nph- scripts.)
nph- occurs when
you want your cgi script to
nph- the server will
(apparently) try to keep the connection with the browser open until
your cgi script and all its sub-process of have
completed. And if the person reading the browser
presses the 'stop' button, the server will try to terminate your
script and all its sub-processes (or rather it should!).
After the script makes its announcement, it uses open to
start a long job, then exits.
Here is the cgi script, and here is the script that it calls.
We execute this by calling unhook.cgi.
You see the problem: the server waits until the long job is completed before it sends anything back to us.
The way around this is to use nph-unhook.cgi.
The class web server will stay up for two months more only, so be sure to move anything you want to keep to another place.
My lecture notes will remain, somewhere. I'll put a link on the internal documents page of the FSL homepage, so FSL people will be able to access them.
There wasn't much interest in another O'Reilly order, so I doubt that I'll send one in, unless some interest materializes in the next week or so.
We've obviously just scratched the surface. But if you put some time into studying Learning Perl, other books, and on-line documentation, you'll find that programming for the web isn't too hard, and can b a lot of fun.
If you have any questions about the web, cgi scripts, and perl, I'm happy to share what I know.