Friday, January 29, 2010

Rakudo ng - what will it mean for us?

If you've been hanging around the right blogs and the #perl6 IRC channel on Freenode, then you've probably seen references to a slightly mysterious "ng", or "Rakudo ng".

That's the upcoming (next) version (generation) of Rakudo, which will form the basis for Rakudo *.

In essence, this is a refactoring/rewrite of Rakudo for the purpose of better compliance with the specification and performance improvements (yay). The old Rakudo master made it difficult — if not impossible — to implement several essential parts of the Perl 6 spec and top priorities on the Rakudo roadmap.

In January, this has led to less focus on the current Rakudo version's bugs and gotchas, and instead on working to prepare ng as the new master branch — that is, the Rakudo that you will be downloading the next time.

For those of us who do some Perl 6 coding in Rakudo, this means that we can expect a nice little bunch of incompatibilities as compared to the current master. And yes, it's very close, so it's time to prepare.

Here's a list of the blindingly obvious things I think we need to watch out for:
  • Older Rakudo was not in line with parts of the spec that ng will be.
  • The spec has changed. (ng development has uncovered several necessary changes.)
  • Older Rakudo is in line with parts of the spec that ng perhaps isn't.
  • Rakudo ng is, of course, not feature complete when it replaces older Rakudo as the master.

In other words: let's not fool ourselves into thinking that we all of a sudden have a new Rakudo that's both compatible with the older as well as being spec compliant.

The good news about Rakudo ng

If you judge by the above paragraphs, you'd think that Rakudo ng was bad for Perl 6 developers. But that's far off the mark. I prodded #perl6 and Patrick Michaud before publishing this post, and here's a brief summary of (most of) the improvements we can see coming with Rakudo ng as opposed to the current implementation.

  • Most of the top priorities of the Rakudo roadmap will be implemented!
  • Laziness will mostly work (the spec is undergoing change)
  • Performance improvements, many due to laziness
  • Array/List/Parcel/etc. will be compliant with the updated spec
  • Protoregexes
  • Better longest token matching
  • Meta-operators are really meta, and generated on demand
  • The base object metamodel is far closer to the spec than before
  • Major portions of the metamodel are implemented in Perl 6
  • Array and hash vivification will work properly
  • Lexical subs and variables work properly
  • Operators have the correct names (with angles)
  • Subs have the correct sigils (with ampersands)
  • Phasers work, and the phaser model is much improved

Our programs will need a bit of attention. I recommend subscribing to perl6-language for up-to-date information about changes to the specification and language discussions.

There's still a lot of work to be done, and I'm sure the Perl 6 developers are happy for any help they can get.

Thursday, January 21, 2010

"Your Unix Is Leaking Perl"

That has to be one of the weirdest statements I've read in, oh, at least fifteen minutes.[1]


(The image is a link to the original strip; if you've got javascript enabled, you can see this bonus strip by hovering over the burgundy red button below to the left.)


A huge thanks to Saturday Morning Breakfast Cereal for this piece of wisdom, but I'm afraid that Zach Weiner got it backwards when he thought that would be expensive.

May I suggest that when a computer "leaks Perl", it does so because the (hopefully brilliant) Perl programmer is contributing a lot to CPAN?

;)


Yes, I've been communicating with customers lately, how did you guess?

Thursday, January 14, 2010

feather.perl6.nl - a Sysadminish Tale

<@Juerd> frettled: Blog about the mess you found when
you first logged in on feather yesterday :)

Sure.

This will, incidentally, also explain why Trac is kindof unavailable now.

feather.perl6.nl is a Xen guest (a virtual machine, hereafter "VM") that's hosting several important services for the Perl 6 community. There's SVN web access, a Trac installation, and a bunch of other stuff I honestly don't know the half of.

Recently, the VM started running out of memory too often for comfort. What was going on? Juerd asked for help in tracking down the problem, as he didn't have the time to do so himself. And needing some distraction from work -- something to help me procrastinate -- I volunteered.

By now, you're probably banging your head on your keyboard in sympathy with me for saying something that may have been slightly less than brilliant. You know the feeling; Matt Trout reaches his right hand towards you, the world is suddenly in slow-motion, you see his hand closing in on you, his grin widening, and a voice saying "thhhhhaaaaannnnnkssss ffffooooorrr vooooluuunnnteeeerrriiiinnnnng", and you're basically up that creek with all the mud and dirt in it.

After handing over an SSH public key and getting sudo access (yeah, yeah, I know), I had a look anyway.

First, I went on a brief but wild goose chase, finding some error messages regarding ConsoleKit which appeared to be more frequent just before the server went out of memory, checked the Debian version (an unholy mix of Debian unstable and Debian experimental with lots of package updates pending someone's attention), and generally tried to get a feel of how the system was configured.

We already knew that Apache somehow might be responsible for gobbling up available memory, so my first action was to have a look at the last 100,000 lines of the Apache access log, using a simplistic log analysis script.

But which log? There were three Apache log directories to choose from. I (correctly) guessed that the one called simply /var/log/apache2 might be the interesting one, the others seemed to be legacy directories which should have been removed ages ago.

According to the script, there were 0 accesses in the last 100,000 lines.

Knowing the script, that was not so strange, because it makes a few assumptions regarding the log format, using a regexp belonging to the days before named captures and whatnot:
while (<>) {
if (/^(\S+) (\S+) - - \[[^\]]*\] \"(GET|POST) \S*
HTTP(|\/1\.[01])\" \d{3} (\d+) \"/) {
The regexp line has been split for the sake of the line width of this blog. There's nothing to be proud of here.

Anyway, I first had to remove the first capture; feather's logs weren't showing the virtualhost as the first column, and access types were most certainly not limited to only GET and POST:
frettled@feather:~$ sudo awk '{print $6}' /var/log/apache2/access.log|
sort -u
"CHECKOUT
"CONNECT
"DELETE
"GET
"HEAD
"MERGE
"MKACTIVITY
"OPTIONS
"POST
"PROPFIND
"PROPPATCH
"PUT
"REPORT
Right.

After straightening that up (and adding %v to the LogFormat specifications in the Apache config for future use), I got the following result:
Use of uninitialized value $size in addition (+)
at /usr/local/sbin/bandwidthips line 39, <> line 1002.
Use of uninitialized value $size in addition (+)
at /usr/local/sbin/bandwidthips line 40, <> line 1002.
Use of uninitialized value $size in addition (+)
at /usr/local/sbin/bandwidthips line 44, <> line 1002.
AAAARRGH! Idiot! Imbecile! Inept half-wit! Yep, I'd forgotten to renumber my captures. See, this is why Perl should be in version 5.10.1 or 6 when fiddling with those bloody annoying regexps.
frettled@feather:~$ sudo tail -100000 /var/log/apache2/access.log|
/usr/local/sbin/hitips|head
193.200.132.146: Bytes = 14329487 (3.44%), Hits = 51503 (51.82%)
66.249.71.2: Bytes = 132116084 (31.73%), Hits = 18111 (18.22%)
66.249.71.37: Bytes = 50846948 (12.21%), Hits = 6236 (6.27%)
93.158.149.31: Bytes = 54880221 (13.18%), Hits = 1894 (1.9%)
71.194.15.106: Bytes = 460200 (0.11%), Hits = 1894 (1.9%)
209.9.237.232: Bytes = 433388 (0.1%), Hits = 1686 (1.69%)
193.200.132.135: Bytes = 1726871 (0.41%), Hits = 1635 (1.64%)
193.200.132.142: Bytes = 429358 (0.1%), Hits = 1609 (1.61%)
208.115.111.246: Bytes = 8461415 (2.03%), Hits = 1238 (1.24%)
67.218.116.133: Bytes = 18945415 (4.55%), Hits = 1126 (1.13%)
So, uhm, around 52% of the hits come from feather3.perl6.nl, and nearly 25% from Google's indexer. Lovely.

Looking at the accesses from feather3, I quickly saw that they mostly had to do with svnweb.

Juerd had already stopped Apache, but someone -- I don't know who -- started it again at 12:00, probably anxious that SVN and such didn't work.

I then followed the running processes using the top command, updating each second (top d1), sorting by memory usage (typing M while top was running), hoping to catch some quickly growing processes.

Nopes. None, zilch, nada. Nothing that appeared horribly wrong. Sure, the apache2 processes used some memory (30-60 MB resident set, 50-100 virtual), but nothing appeared to be out of the ordinary. I changed the update frequency to each third second -- top sometimes uses an inordinate amount of CPU, depending on magic -- and waited. After a while, a couple of apache2 processes were using more CPU and memory than the others, around 60-90 MB resident. And they were growing. And according to lsof, they were active in the svnweb directory (and used a metric shitload of libraries). And after growing, they didn't release memory, they just kept on using it. But it wasn't enough to use up memory, there was still a bunch of free RAM.

So that was perhaps svnweb's fault, then?

Maybe.

But then my time ran out, and I had to drop the ball, leaving the top process running.

Five minutes later, the memory ran out again. It's just as if someone was waiting for me to go idle in order to produce the problem that I was looking for.

Sigh.



svnweb kindof remained the main suspect, until Juerd caught whatever was happening at the right time.

And catching what happens at the right time is bloody important.

Here's what he found, using Apache's server status:



Well, that's not svnweb. That's Trac. And the IP addresses belong to Google.



And that's spam, effectively creating a DoS or DDoS attack on our services as a side effect when search engines try to index the Trac webpages. It probably isn't intentional, but spammers just don't care.

So, what can we do to protect feather from suffering from such attacks in the future?

There's a lot that can be done. It takes effort. It takes time. It takes someone.

Here are a few suggestions on how to improve the robustness of the kind of services feather provides:
  • Add a captcha to the web form. The disadvantage is that this does not really save processing resources, but it probably should be done anyway.
  • Add an unnecessary and bogus input field to the web form, e.g. "Phone number". This input field should be hidden with CSS so that web browsers don't display it, and if someone submits anything with data with that field's name, then you can be nearly 100% certain it's spam from someone who's used a web scraper before automatically filling the form. Filter it out.
  • Change the webserver delegation architecture, so that each Apache process isn't loading tens of megabytes of libraries and keeping them in memory. Off-loading to shorter-living FastCGI daemons or similar solutions, or even sacrificing program startup speed by using CGI+suexec, etc., may be decent starting points.
  • Consider using a front-end proxy like Varnish to gloss over underlying nastiness.
  • Start with a new VM and migrate services to that one, gradually.
  • Document configuration choices and what each web service does/is there for, so that the next sysadmin coming along can make educated guesses quicker. :)
These tasks can rather easily be split into manageable one-person projects.

Does this sound interesting to you, or did I lose you at the third line of this blog entry?

Pop in on #perl6 on the freenode IRC network and say so.

Tuesday, January 5, 2010

Typing More Or Less

Not too long ago, there was a bit of minor cleanup in the Perl 6 specification regarding the use of whatever (*); there were some inconsistencies in how it behaved, depending on context.

The net result is that you now must use @arr[*-1] to get the last element, you cannot get away with simply using @arr[*].

Some may feel that this extra typing is bothersome, especially if you have a Unicode-friendly keyboard setup.

However, we can sneak our way past this problem by using a constant.

This also works with the current release of Rakudo, so it's not quite science fiction:
constant Ω = *-1;
Or, if you're feeling Cyrillic rather than Greek:
constant Ѡ = *-1;
Now we can substitute our nice constant for *-1 anywhere in the following code:
my @letters = 'a'..'z';
say '→'~@letters[*-1];
→z
Like so:
constant Ω = *-1;
my @letters = 'a'..'z';
say '→'~@letters[Ω];
→z
And, of course, you can do this with other things that are so tedious to type when you're dealing with maths:
constant π = pi;
say '→'~π
→3.14159265358979