perl file checks, to see if exists, the age, if directory, etc

Submitted by barnettech on Fri, 11/20/2009 - 11:57

http://www.devshed.com/c/a/Perl/File-Tests-in-Perl/

File Tests in Perl 
( Page 1 of 6 )

In this article, you will learn how to find out useful information about files in Perl. It is excerpted from chapter 11 of the book Learning Perl, Fourth Edition, written by Randal L. Schwartz, Tom Phoenix and brian d foy (O'Reilly; ISBN: 0596101058). Copyright © 2006 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.
Earlier, we showed how to open a filehandle for output. Normally, that will create a new file, wiping out any existing file with the same name. Perhaps you want to check that there isn’t a file by that name. Perhaps you need to know how old a given file is. Or perhaps you want to go through a list of files to find which ones are larger than a certain number of bytes and not accessed for a certain amount of time. Perl has a complete set of tests you can use to find information about files.

File Test Operators
Before we start a program that creates a new file, let’s make sure the file doesn’t already exist so that we don’t accidentally overwrite a vital spreadsheet data file or that important birthday calendar. For this, we use the -e file test, testing a filename for existence:

  die "Oops! A file called '$filename' already exists.\n"
    if -e $filename;

We didn’t include $! in this die message since we’re not reporting that the system refused a request in this case. Here’s an example of checking if a file is being kept up to date. In this case, we’re testing an already opened filehandle instead of a string file name. Let’s say that our program’s configuration file should be updated every week or two. (Maybe it’s checking for computer viruses.) If the file hasn’t been modified in the past 28 days, then something is wrong:

  warn "Config file is looking pretty old!\n "
    if -M CONFIG > 28;

The third example is more complex. Let’s say disk space is filling up; rather than buy more disks, we’ve decided to move any large, useless files to the backup tapes. So let’s go through our list of files* to see which of them are larger than 100 KB. But even if a file is large, we shouldn’t move it to the backup tapes unless it hasn’t been accessed in the last 90 days (so we know it’s not used too often):†

  my @original_files = qw/ fred barney betty wilma pebbles dino bamm-bamm /;
  my @big_old_files; # The ones we want to put on backup tapes
  foreach my $filename (@original_files) {
    push @big_old_files, $filename
      if -s $filename > 100_000 and -A $filename > 90;
  }

This is the first time that you’ve seen it, so maybe you noticed that the control vari able of the foreach loop is a my variable. That declares it to have the scope of the loop, so this example should work under use strict . Without the my keyword, this would be using the global $filename .

The file tests look like a hyphen and a letter, which is the name of the test, followed by a filename or a filehandle to test. Many of them return a true/false value, but several give something more interesting. See Table 11-1 for the complete list and read the following discussion to learn more about the special cases.

Table 11-1.  File tests and their meanings

File test	Meaning
-r	File or directory is readable by this (effective) user or group
-w	File or directory is writable by this (effective) user or group
-x	File or directory is executable by this (effective) user or group
-o	File or directory is owned by this (effective) user
-R	File or directory is readable by this real user or group
-W	File or directory is writable by this real user or group
-X	File or directory is executable by this real user or group
-O	File or directory is owned by this real user
-e	File or directory name exists
-z	File exists and has zero size (always false for directories)
-s	File or directory exists and has nonzero size (the value is the size in bytes)
-f	Entry is a plain file
-d	Entry is a directory
-l	Entry is a symbolic link
-S	Entry is a socket
File Test Operators
Table 11-1.  File tests and their meanings (continued)

File test	Meaning
-p	Entry is a named pipe (a “fifo”)
-b	Entry is a block-special file (like a mountable disk)
-c	Entry is a character-special file (like an I/O device)
-u	File or directory is setuid
-g	File or directory is setgid
-k	File or directory has the sticky bit set
-t	The filehandle is a TTY (as reported by the isatty() system function; filenames can’t be tested by this test)
-T	File looks like a “text” file
-B	File looks like a “binary” file
-M	Modification age (measured in days)
-A	Access age (measured in days)
-C	Inode-modification age (measured in days) 

The tests -r, -w, -x , and -o tell if the given attribute is true for the effective user or group ID,* which essentially refers to the person who is in charge of running the program.† These tests look at the permission bits on the file to see what is permitted. If your system uses Access Control Lists (ACLs), the tests will use those as well. These tests generally tell if the system would try to permit something, but it doesn’t mean that it really would be possible. For example, -w may be true for a file on a CD-ROM, though you can’t write to it, or -x may be true on an empty file, which can’t truly be executed.

The -s test does return true if the file is non-empty, but it’s a special kind of true. It’s the length of the file, measured in bytes, which evaluates as true for a nonzero number.

A Unix filesystem‡ has seven types of items, represented by the seven file tests -f , -d , -l , -S , -p , -b , and -c . Any item should be one of those. If you have a symbolic link pointing to a file, that will report true for -f and -l . So if you want to know whether something is a symbolic link, you should generally test that first. (You’ll learn more about symbolic links in Chapter 12.)

The age tests, -M , -A , and -C (yes, they’re uppercase) return the number of days since the file was last modified, accessed, or had its inode changed.* (The inode contains all of the information about the file except for its contents. See the stat system call manpage or a good book on Unix internals for details.) This age value is a full floating-point number, so you might get a value of 2.00001 if a file were modified two days and one second ago. These “days” aren’t necessarily the same as a human would count. For example, if it’s 1:30 A.M. when you check a file modified at about an hour before midnight, the value of -M for this file would be around 0.1 , even though it was modified “yesterday.”

When checking the age of a file, you might get a negative value like -1.2 , which means that the file’s last access timestamp is set at about thirty hours in the future. The zero point on this timescale is the moment your program started running,† so that value might mean a long-running program was looking at a file that had just been accessed. Or a timestamp could be set (accidentally or intentionally) to a time in the future.

The tests -T and -B determine if a file is text or binary. But people who know a lot about filesystems know there’s no bit (at least in Unix-like operating systems) to indicate that a file is a binary or text file, so how can Perl tell? The answer is that Perl cheats: it opens the file, looks at the first few thousand bytes, and makes an educated guess. If it sees a lot of null bytes, unusual control characters, and bytes with the high bit set, then that looks like a binary file. If there’s not much weird stuff, then it looks like text. It sometimes guesses wrong. If a text file has a lot of Swedish or French words (which may have characters represented with the high bit set, as some ISO-8859-something variant, or perhaps even a Unicode version), it may fool Perl into declaring it binary. So it’s not perfect, but if you need to separate your source code from compiled files, or HTML files from PNGs, these tests should do the trick.

You’d think that -T and -B would always disagree since a text file isn’t a binary and vice versa, but there are two special cases where they’re in complete agreement. If the file doesn’t exist, or can’t be read, both are false since it’s neither a text file nor a binary. Alternatively, if the file is empty, it’s an empty text file and an empty binary file at the same time, so they’re both true.

The -t file test returns true if the given filehandle is a TTY—if it’s interactive because it’s not a simple file or pipe. When -t STDIN returns true, it generally means that you can interactively ask the user questions. If it’s false, your program is probably getting input from a file or pipe, rather than a keyboard.