28.1.1 Newlines

In most operating systems, lines in files are terminated by newlines. Just what is used as a newline may vary from OS to OS. Unix traditionally uses \012, one type of DOSish I/O uses \015\012, and Mac OS uses \015.

Perl uses \n to represent the "logical" newline, where what is logical may depend on the platform in use. In MacPerl, \n always means \015. In DOSish perls, \n usually means \012, but when accessing a file in "text" mode, perl uses the :crlf layer that translates it to (or from) \015\012, depending on whether you're reading or writing. Unix does the same thing on ttys in canonical mode. \015\012 is commonly referred to as CRLF.

To trim trailing newlines from text lines use chomp(). With default settings that function looks for a trailing \n character and thus trims in a portable way.

When dealing with binary files (or text files in binary mode) be sure to explicitly set $/ to the appropriate value for your file format before using chomp().

Because of the "text" mode translation, DOSish perls have limitations in using seek and tell on a file accessed in "text" mode. Stick to seek-ing to locations you got from tell (and no others), and you are usually free to use seek and tell even in "text" mode. Using seek or tell or other file operations may be non-portable. If you use binmode on a file, however, you can usually seek and tell with arbitrary values in safety.

A common misconception in socket programming is that \n eq \012 everywhere. When using protocols such as common Internet protocols, \012 and \015 are called for specifically, and the values of the logical \n and \r (carriage return) are not reliable.

print SOCKET "Hi there, client!\r\n";      # WRONG
print SOCKET "Hi there, client!\015\012";  # RIGHT

However, using \015\012 (or \cM\cJ, or \x0D\x0A) can be tedious and unsightly, as well as confusing to those maintaining the code. As such, the Socket module supplies the Right Thing for those who want it.

use Socket qw(:DEFAULT :crlf);
print SOCKET "Hi there, client!$CRLF"      # RIGHT

When reading from a socket, remember that the default input record separator $/ is \n, but robust socket code will recognize as either \012 or \015\012 as end of line:

while (<SOCKET>) {
    # ...

Because both CRLF and LF end in LF, the input record separator can be set to LF and any CR stripped later. Better to write:

use Socket qw(:DEFAULT :crlf);
local($/) = LF;      # not needed if $/ is already \012
while (<SOCKET>) {
    s/$CR?$LF/\n/;   # not sure if socket uses LF or CRLF, OK
#   s/\015?\012/\n/; # same thing

This example is preferred over the previous one--even for Unix platforms--because now any \015's (\cM's) are stripped out (and there was much rejoicing).

Similarly, functions that return text data--such as a function that fetches a web page--should sometimes translate newlines before returning the data, if they've not yet been translated to the local newline representation. A single line of code will often suffice:

$data =~ s/\015?\012/\n/g;
return $data;

Some of this may be confusing. Here's a handy reference to the ASCII CR and LF characters. You can print it out and stick it in your wallet.

LF  eq  \012  eq  \x0A  eq  \cJ  eq  chr(10)  eq  ASCII 10
CR  eq  \015  eq  \x0D  eq  \cM  eq  chr(13)  eq  ASCII 13
         | Unix | DOS  | Mac  |
    \n   |  LF  |  LF  |  CR  |
    \r   |  CR  |  CR  |  LF  |
    \n * |  LF  | CRLF |  CR  |
    \r * |  CR  |  CR  |  LF  |
    * text-mode STDIO

The Unix column assumes that you are not accessing a serial line (like a tty) in canonical mode. If you are, then CR on input becomes "\n", and "\n" on output becomes CRLF.

These are just the most common definitions of \n and \r in Perl. There may well be others. For example, on an EBCDIC implementation such as z/OS (OS/390) or OS/400 (using the ILE, the PASE is ASCII-based) the above material is similar to "Unix" but the code numbers change:

LF  eq  \025  eq  \x15  eq  \cU  eq  chr(21)  eq  CP-1047 21
LF  eq  \045  eq  \x25  eq           chr(37)  eq  CP-0037 37
CR  eq  \015  eq  \x0D  eq  \cM  eq  chr(13)  eq  CP-1047 13
CR  eq  \015  eq  \x0D  eq  \cM  eq  chr(13)  eq  CP-0037 13
         | z/OS | OS/400 |
    \n   |  LF  |  LF    |
    \r   |  CR  |  CR    |
    \n * |  LF  |  LF    |
    \r * |  CR  |  CR    |
    * text-mode STDIO
