- publishing free software manuals
Perl Language Reference Manual
by Larry Wall and others
Paperback (6"x9"), 724 pages
ISBN 9781906966027
RRP £29.95 ($39.95)

Sales of this book support The Perl Foundation! Get a printed copy>>>

7.33 I/O Operators

There are several I/O operators you should know about.

A string enclosed by backticks (grave accents) first undergoes double-quote interpolation. It is then interpreted as an external command, and the output of that command is the value of the backtick string, like in a shell. In scalar context, a single string consisting of all output is returned. In list context, a list of values is returned, one per line of output. (You can set $/ to use a different line terminator.) The command is executed each time the pseudo-literal is evaluated. The status value of the command is returned in $? (see 10 for the interpretation of $?). Unlike in csh, no translation is done on the return data--newlines remain newlines. Unlike in any of the shells, single quotes do not hide variable names in the command from interpretation. To pass a literal dollar-sign through to the shell you need to hide it with a backslash. The generalized form of backticks is qx//. (Because backticks always undergo shell expansion as well, see 22 for security concerns.)

In scalar context, evaluating a filehandle in angle brackets yields the next line from that file (the newline, if any, included), or undef at end-of-file or on error. When $/ is set to undef (sometimes known as file-slurp mode) and the file is empty, it returns '' the first time, followed by undef subsequently.

Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a while statement (even if disguised as a for(;;) loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) The $_ variable is not implicitly localized. You'll have to put a local $_; before the loop if you want that to happen.

The following lines are equivalent:

while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while defined($_ = <STDIN>);
print while ($_ = <STDIN>);
print while <STDIN>;

This also behaves similarly, but avoids $_ :

while (my $line = <STDIN>) { print $line }

In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:

while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }

In other boolean contexts, <filehandle> without an explicit defined test or comparison elicits a warning if the use warnings pragma or the -w command-line switch (the $^W variable) is in effect.

The filehandles STDIN, STDOUT, and STDERR are predefined. (The filehandles stdin, stdout, and stderr will also work except in packages, where they would be interpreted as local identifiers rather than global.) Additional filehandles may be created with the open() function, amongst others. See "Tutorial on opening things in Perl" (perlopentut) in Perl Tutorials and for details on this.

If a <FILEHANDLE> is used in a context that is looking for a list, a list comprising all input lines is returned, one line per list element. It's easy to grow to a rather large data space this way, so use with care.

<FILEHANDLE> may also be spelled readline(*FILEHANDLE). See .

The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the @ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The @ARGV array is then processed as a list of filenames. The loop

while (<>) {
    ...                     # code for each line
}

is equivalent to the following Perl-like pseudo code:

unshift(@ARGV, '-') unless @ARGV;
while ($ARGV = shift) {
    open(ARGV, $ARGV);
    while (<ARGV>) {
        ...         # code for each line
    }
}

except that it isn't so cumbersome to say, and will actually work. It really does shift the @ARGV array and put the current filename into the $ARGV variable. It also uses filehandle ARGV internally. <> is just a synonym for <ARGV>, which is magical. (The pseudo code above doesn't work because it treats <ARGV> as non-magical.)

Since the null filehandle uses the two argument form of it interprets special characters, so if you have a script like this:

while (<>) {
    print;
}

and call it with perl dangerous.pl 'rm -rfv *|', it actually opens a pipe, executes the rm command and reads rm's output from that pipe. If you want all items in @ARGV to be interpreted as file names, you can use the module ARGV::readonly from CPAN.

You can modify @ARGV before the first <> as long as the array ends up containing the list of filenames you really want. Line numbers ($.) continue as though the input were one big happy file. See the example in for how to reset line numbers on each file.

If you want to set @ARGV to your own list of files, go right ahead. This sets @ARGV to all plain text files if no @ARGV was given:

@ARGV = grep { -f && -T } glob('*') unless @ARGV;

You can even set them to pipe commands. For example, this automatically filters compressed arguments through gzip:

@ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;

If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the front like this:

while ($_ = $ARGV[0], /^-/) {
    shift;
    last if /^--$/;
    if (/^-D(.*)/) { $debug = $1 }
    if (/^-v/)     { $verbose++  }
    # ...           # other switches
}
while (<>) {
    # ...           # code for each line
}

The <> symbol will return undef for end-of-file only once. If you call it again after this, it will assume you are processing another @ARGV list, and if you haven't set @ARGV, will read input from STDIN.

If what the angle brackets contain is a simple scalar variable (e.g., <$foo>), then that variable contains the name of the filehandle to input from, or its typeglob, or a reference to the same. For example:

$fh = \*STDIN;
$line = <$fh>;

If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means <$x> is always a readline() from an indirect handle, but <$hash{key}> is always a glob(). That's because $x is a simple scalar variable, but $hash{key} is not--it's a hash element. Even <$x > (note the extra space) is treated as glob("$x "), not readline($x).

One level of double-quote interpretation is done first, but you can't say <$foo> because that's an indirect filehandle as explained in the previous paragraph. (In older versions of Perl, programmers would insert curly brackets to force interpretation as a filename glob: <${foo}>. These days, it's considered cleaner to call the internal function directly as glob($foo), which is probably the right way to have done it in the first place.) For example:

while (<*.c>) {
    chmod 0644, $_;
}

is roughly equivalent to:

open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
    chomp;
    chmod 0644, $_;
}

except that the globbing is actually done internally using the standard File::Glob extension. Of course, the shortest way to do the above is:

chmod 0644, <*.c>;

A (file)glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before it will start over. In list context, this isn't important because you automatically get them all anyway. However, in scalar context the operator returns the next value each time it's called, or undef when the list has run out. As with filehandle reads, an automatic defined is generated when the glob occurs in the test part of a while, because legal glob returns (e.g. a file called 0) would otherwise terminate the loop. Again, undef is returned only once. So if you're expecting a single value from a glob, it is much better to say

($file) = <blurch*>;

than

$file = <blurch*>;

because the latter will alternate between returning a filename and returning false.

If you're trying to do variable interpolation, it's definitely better to use the glob() function, because the older notation can cause people to become confused with the indirect filehandle notation.

@files = glob("$dir/*.[ch]");
@files = glob($files[$i]);
ISBN 9781906966027Perl Language Reference ManualSee the print edition