J2EE J2ME Cryptography Work Personal Poetry Blog
 
... and wrong.   Never have I seen such a cluster of these answers around the questions of "How long will <X> take?", and "Are we there yet?". Today I'm going to describe a bit about how to answer them, and be a lot less wrong in the process.

How long will <X> take?
This is the classic estimation problem.  A wiser man than myself said "Normally longer than you think", but while accurate, it's not very fulfilling nor informative.  My experience is that most people are pretty damn good at estimating how long "their bit" takes, and what's more - how long "their bit" takes when they actually get to work on it.  However, nearly everybody is terrible at understanding the effects of external influences on the time.  This is where the significant source of error in estimation occurs.  Not only that - this is where great caution needs to be applied to the "well, it took Z days last time" without understanding the influences on that outcome.

At this point we get to segue into a real world example of exactly this.  I ride to work every day.  I've ridden to work every day for over 18 months.  ~ 6km each way for 12 months to ANZ (~400 samples), and ~ 10km each way for 6 months to REA (~200 samples).  I travel on the same route every day and at approximately the same time every day.  I have a Garmin 500 GPS unit that tracks all my travels - so I have a long historical record of doing exactly the same thing every day.  With all this wonderful data, you would think I'd be able to accurately predict how long it takes me to get to and/or from work.  Here's the news, for what is an average of 30 min journey, I cannot predict within 10% what my journey time will be.  How the fuck is that possible?  My fastest time home is 25 minutes, and my slowest is nearly 35 minutes.  

So, you're an astute reader (well, you're reading my blog - so you must be), you're scratching your head trying to work out how I'm getting nearly a 30% variation over the time.  Time of day? (no) Weather? (no) Fitness? (no) Bike chosen to ride on? (no) Traffic? (no)

Here's the crucial piece of information that my awesome Garmin unit has.  It has my average speed, and my average _moving_ speed.  Turns out that my average moving time is very stable (pretty low variation) - a fairly comfortable 26km/hr.  So, if my moving speed is constant at 26km/hr - how on earth is there a 30% variation?

Externalities.  In this case - traffic lights.  There are 30 traffic and pedestrian lights on my trip into work.  I've not done the analysis on all of them - but I know that 2 of the traffic lights have a cycle of 2 minutes.  So - from a best case of 0, to a worst case of 4 minutes - that's a  10% variation just from those.  Wow.   On the upside, I can say that my _expected_ time is 30 minutes, but it could be from 25 to 35 minutes.

So, here's a tip.  When looking at estimating - even for things you know you do all the time - look at the external influences on the task at hand.  Count them - that should give you a good idea of the level of variation that may occur.  More external influences that you don't have any control over - the lower the confidence that should be placed, and the greater the need to have a conversation about "minimum, expected and maximum".

This is also a very good reason to use synthetic values for estimation (function points, story points) and instead of predicting, use tracking as a means of determination of task and project length.

Are we there yet?
Not only is this the dreaded question for parents of children, it's also a bleeding sore for most software developers.  Provided you've already moved away from aggregating single estimates in hours or days and have decided that a synthetic proxy is the way to go (great first step) we need to have some coherent way of determining when we're likely to be finished.

We've all read the "past performance does not guarantee future performance", yet that is exactly what we're doing when we take a time-slice of the project, and then project the work already completed to determine the end-point.  The good (and bad) news is that while we should keep the quote in the back of our heads, there's not a better way to determine the end-point.

However, the big kicker is this, the value of "past performance" is relative to the volume of activity performed, and the amount of variance external activities have caused on those activities _relative_ to the possible impacts remaining.  The first ~200m of my journey has no traffic lights, so it should come as no surprise that there's low variation, but also should come as no surprise that the predictive value of that first 200m of the journey is low.  Yet, I see people doing this every day in projects.  "At the end of the first iteration we did 40 units of work, excellent - we're going to finish in <X>" and then getting frustrated, angry or disappointed that next iteration only 20 units of work was completed.

At what point can we have a discussion about the end-point? You probably could at the very start, but it's hardly a valuable discussion. The very end is probably too late - so it's somewhere in between. Sure, but where? This is the hard part, and it's a function of the number of external influences remaining. As we've seen from my cycling story above, when there's a large number of external influences (approximately 1 every minute) - we're looking at a 30% variation, regardless of where we choose to make a projection.  Clearly as we get closer to the end, there's less total impact, but the ratio of impact remains (mostly) constant.

Sadly for this story, there's not an easy answer to the question of "Are we done yet?".  The best advice I can give is to reduce the external impacts or at the very least be able to quantify them and reduce the problem to understanding your average moving speed.



Once more unto the breach

| 1 Comment
One of the (very few) things that I completely love about using a Mac for software development is the integrated command line which does a pretty good job of approximating a Unix environment.  This is great for me, and supplements how I work nicely. 

To get a similar environment under Windows has always been a great goal, but somewhat hindered by the completely shit that is cygwin.  So, as I commented in my last entry, I have started using VirtualBox to provide a great scripting environment, but I'm still stuck in the mire of not having a decent terminal client to access the command line shell.

There are a number of alternatives, of which SecureCRT is probably the best (an awesome application that I've owned for many, many years) but it's a commercial piece of software that many people may not feel the need to buy.  PuTTY is pretty good, but it's only a marginally better application than the standard Windows console application.

Frustration over the weekend finally got to the point of doing something about it, and I investigated a number of alternatives including Console2, Xshell, MinGW and finally settled on using MobaXterm (http://mobaxterm.mobatek.net/en/).  Great application with a number of awesome features that I've not completely finished exploring yet.  It comes as a free for personal use, with an upgrade to a professional version for $49 EURO.  I'll keep using it for a while to see if I want to upgrade, but I might do it just to support the work.

Hope that helps some people become more productive.

Use of weapons

| 1 Comment
While I may rant on occasions about various desktop environments, I'm pretty neutral about them provided they are vaguely suited to use.  Now, I'm doing a whole bunch of stuff in Unix-land, and frankly either MacOS or Win7 it complete shit at dealing with this.  So, I've been looking at how I can do this without hurting my head too much.

My answer is "to the cloud".  Actually no, that's what everybody else is doing.  My real answer is "virtual machines".  Modern desktop systems are for the most part powerful enough to run a VM with a minimum of effort, and you can create/clone/destroy them as often as you want.  This is the real joy of dealing with a VM.  Right now I want to spin up a dev environment that mirrors our CI view of the world.  So, I create a VM, I install stuff.  I *clone* the VM and start using the cloned VM.  I can completely screw around with it - and if I fuck it up (or when I'm finished with it) I just discard it - and I can grab the template VM and start again.

The other benefit is that I don't have to worry about installing any form of UI on those VMs - because I can just create an SSH server, and ssh my way into the virtual machine and go to town.  I can use the "power" of the graphical display of my native desktop platform (with all the ability to cut and paste things into an email / a log / a whatever) while having a *real* Unix environment to work with.

Now, while the *real* Unix is compelling - the single biggest feature that makes it so desirable to use a VM is the create and destroy model.  I can even create a base template with all the packages installed, then use that to clone a couple of nodes and fire them up. 

I've been using VirtualBox on Windows and VMWare Fusion on MacOS.  Some of the reasons for these choices have been the ability to install some of the native tools, which make it easier to do cut and paste - and to share directories between the native and slave environments.   Some of this has been due to UI convenience (VBox on Windows I can run a fullscreen Ubuntu for example, but VBox doesn't let me do that on MacOS - no support)

I think I'm going to standardise on using VirtualBox if I can work out the native directory sharing between the MacOS and Unix environment - but at this stage I may be using 2 different systems.

eaves.org has moved

| 1 Comment
We moved servers.

Seagate Momentus XT 500gb installation

| No Comments

I thought I'd let people know about my current experience with replacing my laptop hard drive, it may well be useful for others.

The starting configuration was a Seagate ?? 500gb 7200rpm - pretty standard laptop hard drive.    I like to replace my hard drives every 12 months because I find they are one of the largest sources of issue and heat death with a laptop.  For the princely sum of $100 it's normally a pretty good investment.

I really wanted to get a 500gb SSD drive, but there's none of those, and if there was, it would be retardedly expensive.    The next option would be to get a second drive, and use the SSD for boot + games - needing about 120gb, but laptops have a limiting factor of only 1 drive.  During my research I found that I could get a drive bay to replace the DVD, but that would be just messy - but a possible consideration next time around as I'd put the 128gb SSD in the primary drive spot, and use the "DVD bay" as my mechanical drive bay.

So, after dicking around for a bit and procrastinating I ended getting a Seagate Momentus XT 500gb hybrid drive.  It allegedly has a bunch of SSD cache in the drive which make it perform amazingly.  (More on that later).  All the reports indicate the startup, seek etc times are more indicative of a 10k rpm drive than a 7200rpm drive.  One of the downsides of putting a 7200rpm drive in the laptop is the additional heat it generates over a 5400rpm drive, but as we all know, more rpm = faster disk access = faster loading into Dalaran !

Now, rather than my previous upgrade paths which have been a "install new drive, install current version of windows and then copy everything over" I really didn't want to go down that path as my Win7 install is an upgrade, so I'd have to fuck around with getting Vista on first, then upgrade etc, etc.  So, I did some research on drive cloning and ended up using R-Drive Image.  I'd had enormous success in using the R-Drive recovery software and really liked their stuff, so off I went and cloned the drive.

Took about 4 hours, but it wouldn't boot - FUCK YOU.  I did some searching around and tried the Win7 repair - but still didn't work.  Some fuckery with geometry and stuff and I was running late for work and yeah - found out I probably could have recovered using some command line bootmgr re-installation.  I might try that next time.

So, during my searching I found that most HD manufacturers have a cut down version of HD cloning software with support for their drives, so I downloaded Seagate DiskWizard.  Installed, ran and off it went and changed my current drive so it would reboot into the supervisor style mode for doing to raw disk copy.  As it rebooted it couldn't find my original disk.  It's 11pm, the house is sleeping - choke-a-bitch-index of 11.   I try repair, I try and be calm, and the Win7 repair disk did a fine job.  Which was really fortunate for the Seagate team, or I'd be on a plane right now to stab some fuckers.    I'm starting to think some of my original issues might have been minor issues with my current drive that was cloned over.

Restarting the process at about midnight it takes about 7 hours to copy 100gb of system disk, 42gb of photos, 150gb of games and 50gb of random crap.  I wake up the next morning and it's just finished.    I swap around the drives and have the new one installed in my laptop.  I rename the partitions so all my current shortcuts etc are still working fine and boot up WoW to check things.  The Win7 boot sequence is a lot faster - probably 50% of the previous time.  I didn't do any timings (sue me) but I remember "watching Win7 start up", and now it's "hurry up and log in" - so probably down from 20 seconds to 10 seconds.

First thing I notice is how quiet the new drive is.  I check the temperature and it's running about 15 degrees cooler while WoW is in "normal mode".  That's a good sign (65 degrees down from 80).  The next thing I notice is how much faster it is.  WoW starts in about 50% of the time.   There's no delay/lag on the login screen (which is all local).

So far - so good, and for the premium of $60 it seems like it may have been a good investment.  (Standard 500gb was about $100, this was $160)