Wednesday, June 24, 2009

Virus scanning with File::Scan::ClamAV

This is almost ridiculously easy.

Problem: a bunch of user directories need virus scanning and per-user reports

Solution: A Perl script using File::Scan::ClamAV

Prerequisites:
  • A unixy OS
  • Perl 5.8 or 5.10
  • A functional and running clamd, preferably listening to a socket
  • The module File::Scan::ClamAV (and its dependencies)

The following code could (after some sensible adjustments) run in a loop through all usernames on your system.

use File::Scan::ClamAV;
# (...)
# $dir contains the full path of the user's directory
my $av = new File::Scan::ClamAV (find_all => 1,
port => '/tmp/clamd.socket');
# find_all means that we wish to recurse directories.
# /tmp/clamd.socket is where my clamd has its socket.
# Other clamd configurations may differ.

unless ($av->ping) {
plogdie "clamd isn't running, aborting virus scan";
} else {
plog "Performing virus scan for $uname";

# Save virus information per username ($uname).
# Note! scan() returns a hash.
$a_viruses{$uname} = $av->scan($dir);

if ($a_viruses{$uname}) {
my @vfiles = sort keys %{$a_viruses{$uname}};
plog "$uname has ".@vfiles." viruses.";

# Home assignment: print contents of $a_viruses{$uname}
# to a file, using the sorted list @vfiles.
}
}

Monday, June 15, 2009

Frequently freaky freakin' one-liners

So, hey, I'm sitting here without anything good to blog about, probably like most people on the net.

I'm wondering what daily Perl usage that's even vaguely useful that I do, which could be improved upon.

Ah, of course, triple-f one-liners!

As a tool, the perl command often seems to replace a jungle of echo + egrep + cut + tr + sed + awk and whatnot. perl -nawe and ctrl+r (reverse i-search) in bash are good friends of mine, but after using the same one-liners a few times in a row, I usually end up converting them to tidy files with Getopt::Long, comments and other insanities.

And at some stage later, I say to myself: damnit, I should've coded this more generally, I start a recode, get distracted, solve a new problem with one-liners, and the circle of life goes on.

Do I need professional help?

Monday, June 8, 2009

DateTime performance hit

This is mainly in answer to the comment to my previous post on revisiting plog.

plog is a piece of code ready for copy+paste into whatever codebase you're in at the moment, and almost completely agnostic. Yes, in code where I'm already using DateTime, then I use those routines.

You could also create a small sub to hide the "nasty idiom".

Or you could use Date::Format instead (ca. 10 kB codebase instead of 110 kB, and between one and two orders of magnitude quicker for this particular purpose):
require Date::Format;
@lt = localtime();
$dt = strftime("%Y-%m-%d %T",\@lt);

Of course, that leaves you with the problem of whether the TimeDate packages are installed or not, and another test for that.

Besides, the bloat is not exactly insignificant if you have code that's running repeatedely.

The following example with 100,000 repetitions may seem ludicrous, but I actually have production code where timestamped log entries run into that order of magnitude. And yes, I would very much like to save that extra time.

Edit 2009-06-09 00:32 UTC: thanks to Dave Rolsky for the simplified testing code, and to Ilmari for reminding me of POSIX::strftime, which I'd previously rejected based on other people's claims that it was dog slow. I've removed the home-grown tests in favour of the results from a slightly modified version of Dave's sample test script.

Here are the numbers for 100k iterations, times derived from the rates:
SolutionTimeRate
DateTime490.2 s204/s
DateTime (cached tz)76.7 s1 304/s
Date::Format6.4 s15 528/s
POSIX3.3 s30 030/s
localtime0.6 s163 934/s

Sunday, June 7, 2009

Print-and-log revisited

A month ago, I made a post with a simple print-and-log subroutine called plog.

I was recently asked two questions about this piece of code, and I'll answer them briefly now:
  1. "What's up with the curly braces on a separate line after the sub declaration?"
  2. "Why don't you use DateTime?"
Okay, those should be easy to answer while retaining the illusion of clue:
  1. That's merely a personal preference, it quickly aids me in noticing whether something is a subroutine or a control statement block. I am aware that many Perl programmers disagree, and prefer all blocks to start in the same way regardless of what kind of block it is.
  2. DateTime is an external module which, believe it or not, is not installed on all systems. Some people would also argue that it's bloated and slow. But in case you want to use DateTime instead, and/or check whether it's available run-time, then ... well, see below.
sub plog
{
my $msg = shift;
my $level = shift;
my $dt;
eval {
require DateTime;
$dt = DateTime->now(time_zone=>'local')->strftime("%F %T");
};
if ($@) {
my @lt = localtime;
# Format current datetime manually:
$dt = sprintf("%d-%02d-%02d %02d:%02d:%02d",
$lt[5]+1900,$lt[4]+1,
$lt[3],$lt[2],$lt[1],$lt[0]);
}

(...)
As you can see, there isn't much code saved by using DateTime, even if we know it's already installed and don't need to add paranoia. The method of extracting data from localtime() is well-known, proven, and fairly nice on resources. Use DateTime if you want to, but perhaps it's best to save it for when you need to do more complicated stuff.