About

This is the archive page for Head of the Kyu. Click to go to the frontpage of this site.

Last Comments

ethauvin (XP Day France 200…): Laurent, Did you know t…
mishiltu (A handy heuristic…): An interesting way at loo…
Ben (Breakthroughs and…): Here is a strange observa…
Carl Manaster (A handy heuristic…): Note on the comment syste…
Carl Manaster (A handy heuristic…): Let’s try that again… www…
Carl Manaster (A handy heuristic…): I once wrote a little vis…
Nico (A handy heuristic…): C# can have multiple clas…
Nat (A handy heuristic…): “an XP project is suppose…
Jim Bullock (A handy heuristic…): Are the class (or whateve…
Ģirts Kalniņš (Head of the Kyu): good to see you back.

Calendar

« May 2008
S M T W T F S
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Archives

Next Archive Previous Archive

01 Mar - 31 Mar 2007
01 Jan - 31 Jan 2007
01 Oct - 31 Oct 2006
01 Feb - 28 Feb 2006
01 Jan - 31 Jan 2006
01 Nov - 30 Nov 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 Jun - 30 Jun 2005
01 May - 31 May 2005
01 Mar - 31 Mar 2005
01 Feb - 28 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004
01 Jul - 31 Jul 2004
01 Jun - 30 Jun 2004
01 May - 31 May 2004
01 Apr - 30 Apr 2004
01 Mar - 31 Mar 2004
01 Feb - 28 Feb 2004
01 Jan - 31 Jan 2004
01 Dec - 31 Dec 2003
01 Nov - 30 Nov 2003
01 Oct - 31 Oct 2003
01 Sep - 30 Sep 2003
01 Aug - 31 Aug 2003
01 Jul - 31 Jul 2003
01 Jun - 30 Jun 2003
01 May - 31 May 2003
01 Apr - 30 Apr 2003
01 Mar - 31 Mar 2003
01 Feb - 28 Feb 2003
01 Jan - 31 Jan 2003

Miscellany

Powered by Pivot - 1.40.0: 'Dreadwind' 
XML: RSS Feed 
XML: Atom Feed 

30 October 06 - 15:58A handy heuristic for auditing source code

Clients often ask me to take a look at some project's code base. "What do you think of the design ? Is this well-factored ?"

The last time this happened, I spent a few minutes looking at the code and came up with several observations. "There's a huge amount of generated code here, you might want to watch the complexity cost of that... This code here looks quite complex and has little separation of concerns... Here is an entire class missing unit tests... This class here has long stretches of duplicated code..."

Then my client had an interesting remark. "I don't know how you do this - spot so many things in such a short time." I realized I didn't know how I did it, until he asked - I just did it.

My technique, for what it's worth, consists of getting a file list of all source code files, then sorting that list by (decreasing) file size. I start with the largest source file, and see if there's a reason it's the largest.

  • Generated code
Very often, generated code is excessively verbose, so if a project uses code generation at all, those files tend to be at the top of the list. There's a number of anti-patterns related to code generation; I'll make a note to look into that later. (Does it happen that people have to hand-edit generated code ? Are the generated code files checked into version control ?) I then delete these files and generate the file list again. (Yes, delete - I'm only working on a local copy and everything is in version control anyway. If that's not the case, don't bother with the audit - getting the team to start using version control takes priority .)

  • Core class
Another reason for a class to float to the top is that it "does everything". "Yes, it's a big class - but it's the object that's at the center of all our design, the entire app revolves around it." I'll check for a large number of defects associated with that class. Not all code is equally defect-prone; a small number of source file often accounts for a disproportionate fraction of all defects. Classes (or source modules in general) tend to have fewer when they do one and only thing: querying the database OR crunching numbers OR formatting reports. A class that does everything will do everything badly.

Languages like Java or C# have a useful convention of "one file, one class" which makes it easier to spot overlarge classes by looking at file sizes. In some C++ codebases things can get more complicated - but then, that's generally useful information too.

  • Smelly code
Even a class with reasonably focused responsibility can bloat up fast if you put your mind to it - that's what Copy and Paste are for. Accordingly, the largest source files are a good place to look for design smells: duplicated code, long methods, switch statements, etc. When the team is supposed to use TDD, I also check the unit tests for that class or module. TDD generally results in smaller methods (because code is written one test at a time, and tests must exercise methods in isolation) and limited duplication (because we're supposed to refactor each time we've made a test pass). So, either the team has not been applying TDD... or something else is happening that I should know about.

  • Other observations
It's also interesting to sum the file sizes (you can compare, for instance, the amount of test code and application code; an XP project is supposed to show roughly a 1:1 ratio), figure out the average, or look at the smallest files.

It's not that I know exactly what I'm doing or what I'm looking for when I do this sort of thing. I know that there's some interesting research concerning, for instance, power law distributions in source code. But as far as I'm aware there are no clear models of why this or that size distribution should arise, thus little practical guidance if you find code that does not obey the expected distribution. (I'd appreciate any pointers from readers in the direction of relevant research.)

The point is more that, looking at the "macroscopic" properties of source code, I will reach results faster than by poring over details. These results will orient my next steps, if I want to drill down to the details; or, quite often, they will point to immediate steps the team can take to improve its design - which is the kind of advice I get paid for.

- default - seven comments / No trackbacks - §

29 October 06 - 16:38Head of the Kyu

No, this blog isn't dead. It's... resting.

At any rate, it was, for the past few months - other things took priority then the "three week rule" intervened (they say it takes 21 days to form a habit, or lose one in this case). Then I decided that a long pause in writing was OK, a normal pattern of renewal.

Then a few weeks ago, I had an idea that I wanted to write down. When I tried to log into the blogging tool, it no longer knew me. "Invalid login," it said, cold as a judge. What ? I haven't been away that long ! You're a computer, you're not supposed to forget people !

It seems that my hosting provider upgraded the server's OS, or something of the sort, and that change was incompatible with something in my previous blog engine. I'm not going into the details - I had to figure out exactly what was wrong, or I couldn't have gone on thinking of myself as a techie. But they are quite uninteresting and sooner forgotten.

I made a note to install a new blogging tool when I had time, and today was the day. I shopped around for a new blogging tool, my main requirements being that it could import my old entries (renewal yes, scorched earth no) and not require me to use MySQL. (I prefer to deal with a single set of problems, whenever possible.)

I settled on a new name for the blog. It's a somewhat obscure multiple pun involving two distinct cultures - I find it rather fitting, we'll see how it gets across.

Now to think of something interesting to write. See you soon.

- default - one comment / No trackbacks - §

Linkdump