Wednesday 31 August 2011

How much does your slow machine cost your company?


The problem

I'm currently working at your standard large company. The computers are managed centrally and are replaced every 3 or 4 years or so. The machine I'm working on is a Dell running Windows 7 32bit with 3.21 GB of ram, an Intel core 2 duo CPU at 3GHz and on-board graphics running 2 monitors. Not a terrible machine you may say, but far far from good enough as a developer machine.

We are developing an MVC 3 enterprise web app running on SQL Server 2008, LINQ to SQL, VS2010, Resharper 6, TFS integration. Javascript tests, unit tests, integration tests, feature tests and acceptance tests make up a sizeable testing suite. As you can imagine, this set-up can put quite a load on the machines especially as our app has been growing fast. Our team's velocity is still good, so the code base is growing quickly.

We are finding that the machines hang periodically. All the dev machines are quite slow and all exactly the same spec but for some reason, a few are terribly slow, and we avoid pairing on those machines. Then there's the 30-60 second waits for visual studio builds, the 5 minutes for the check-in builds, the time to compile and run tests as you are doing TDD, the time our tests run for, the 10 seconds+ for visual studio and resharper to refactor things, find things. It all adds up, but how much does it cost? Not only does it cost in terms of wasted time but also in context switching. For example, you're doing TDD, you write a test, do some coding and hit run test but have to wait 30 seconds+ for it to run. This takes long enough to break your flow, you have a quick think about something else and then you realise the test has run and you need to switch you attention back. You might have a quick chat about something else with your pair.

We know it's hurting our velocity but without numbers it's difficult to convince management of the true costs.

So what did we do?

We took a stop watch, kept it with us all day and recorded all the time that where we were waiting for the computer to do something - from opening apps, running builds and tests, searches and refactorings in visual studio - any time at all where the developer had to wait for the machine to work, be it 5 seconds or 5 minutes the stop watch was running. It took quite a lot of discipline. The results were startling.

Results

I did this for a week, every day, and so did a colleague. Our results were very similar in that on average we were sitting unproductive for a collective time of 30 to 60 minutes a day with a couple of days at 15 minutes (mainly due to days with meetings) and a couple of days at a whopping 75 minutes (these were days when perversely we were going quite quick and getting things done, but running builds/tests/check-ins all day takes time).

So lets say 40 minutes wasted per pair per day. That sounds like a lot, but it's how long the stopwatch said. You try it for a day. What are your numbers?

The machine

We are running Windows 7 on a machine that originally was running server 2003, so the requirements on the machine are different. We have turned off all the Aero UI elements from the machines (we have on board graphics). We've turned off the virus checking (controversially, as this is a corporate environment), and followed numerous tutorials on the internet about improving the speed of windows, but all to no avail.

If we had a faster, better, newer machine, then how much would we gain? Well that's subjective, but for arguments sake, say we had a great machine and could cut all the waiting/processing time in half  (*1)

The Costs

What is the cost to the business of one dev per day? It depends on your company, your devs, their seniority, your support staff, building costs, electricity etc. There are many factors, but let's say £200 (this is a conservative estimate *2)

Our team consists of 10 devs, 3 UI/UX/designers, 2 QAs/testers, 2 BAs (analysts), 1 PM (manager) = 18 people. If we assume they each cost a nominal £200 per day, that is £3600 a day for the team as a whole.

Devs are the constraint on the throughput of the system (the bottleneck). That's 10 people losing 40 mins a day, so 400 minutes a day of lost development effort. We said a fast machine might cut this in half, so we have 200 mins of needlessly lost development effort per day. How much does that cost?

We work an 8 hour day, 10 devs x 8 hours = 4,800 minutes of dev time available per day.

So the cost per minute for the throughput is £3,600 / 4,800 = £0.75 per minute (Wait you say, you took the cost of the team, divided by the time of the developers? I shall explain *3)

Finally £0.75 x 200 mins = £150 lost by the business every day

Conclusion

To sum up, it is costing the business £150 per day to have 10 developers using slow machines. Given we are paring on all dev work we only need 5 new fast machines. I know you can get a great machine for £1,000 (*4), which will be future proofed for the next 2 years or so. That's £5,000 in total.

£5,000 / £150 = 33 days. That's the number of days it takes to pay off the new machines by not losing the developers time waiting every day.

So is that £1,000 for a new dev machine worth it? I conclude it is, if the project is longer than 33 days. Is yours?



Appendix
  • Theory_of_Constraints on Wikipedia.
  • The_Goal_(novel) is an excellent book that describes the theory of constraints at a manufacturing plant in a story format.
  • 1 - We will only know how much better a great machine would be when we get one and use it on our project, but I suspect given the state of our current machines it will be half the time.
  • 2 - This is a low estimate, especially as I'm a consultant and get charged out at a higher rate than £200/day, but this is likely to be a conservative estimate for any company employing developers.
  • 3 - The theory of constraints says that the cost of the bottleneck is equal to the cost of the system as a whole. The goal of the system is to produce finished production ready functionality. Which part of the system is the bottleneck, stopping you from delivering more functionality? On our team it's us, the developers. We only release what is coded up every week. Nothing the testers or analysts can do can make us deliver more. So the output of the system is governed by the velocity of the devs. The cost of the team (in our case £3600) is the cost of the 20 story points we deliver that week, and it's the devs who control that velocity. That's not to say the other roles are not important, they are, critically important, but they are not the bottleneck.
  • 4 - Yes you should get a good machine for that. You could actually get a great computer for £500 - £600. We have the monitors and peripherals, and given we are a big company we would qualify for Dell's corporate rates.

15 comments:

  1. Staff get paid, jobs get done, new hardware costs money -- based on my experience working in larger companies the solution would simply be to make your staff work 40 minutes longer per day or take 40 minutes less at lunch. Ignoring this and assuming staff time is fixed:

    You have to factor in a bunch of other items:
    * Machines are assets and need to be written off (+costs time)
    * Machines can't simply be thrown away; you have to pay to dispose of them (especially in corporate environments)
    * Machine's don't simply replace themselves, they have to be unpacked, set up, this is more time and money
    * Dev machines contain lots of small configurable things (visual studio, iis, svn, check outs, etc); setting up a new machine takes time -- if you pick similar hardware you could just clone the disk and hope windows deals with the new hardware (potential issue with windows licenses)

    The best hardware upgrade I ever did was switching up to an SSD; Considering these can be had for £150 each (for ~120GB) it'd be interesting to find out if your results are whittle down to IO latency or just sheer processing to determine if an SSD would be the solution.

    I've also seen situations where large companies weren't even using gigabit ethernet so this is also something to look into; especially if you're sat waiting for check ins etc.

    Finally; the last one -- which is usually the hardest -- is tweaking the end user and their behaviour; are there tasks that could be accomplished differently? in a different order? Could you farm off some tasks into automated systems such as a Continous integration server? etc etc.

    Anyway, nice article. I do this sort of stuff all the time for my own work, I'm freelance now and making the decision between time and money is starting to become a huge factor.

    Chris

    ReplyDelete
  2. wow.you should outsource it.. to pricy for a team...
    Actually pc spec is good. pc hang because you don't have graphic card........ just that..
    I wouldn't suggest the latest core i5 /i7 for development.core 2 duo is enough.more faster then lastest core i5.The reason less buss speed??

    ReplyDelete
    Replies
    1. The PC hanging is most likely because it is swapping and nothing to do with the graphics card. I had a similar situation (with company not wanting to spend money), and adding a hardware RAM drive for the swap file noticeably helped so I'm sure an SSD would help even more. Ideally upgrade the RAM from 4GB to 8GB or more, but if like the writer you are on a 32 bit OS then that isn't really an option.

      Delete
  3. I did similar estimates latelty. In my country we have similar stakes but in different currency (3-4x less in practice), thus it is even harder to convince anybody to invest in hardware(which is even more expensive here).

    I have similar configuration as you mentioned and it is best to invest in 2nd generation ssd drive (~300mb/s, for 3rd generation you would need to have sata 3 - 6gb/s), new workstation with ~i7 class CPU, 8gb ram and hdd would still hangs sometimes, which is never the case with ssd.

    ReplyDelete
  4. Not to mention the affect on flow...

    ReplyDelete
  5. A good point of view. The problem is the subjectivity of it, to do a study and able to demonstrate that gain. In terms of "managers in suit".

    ReplyDelete
  6. Thanks for doing the arithmetic. I'm amazed at how many times I recommend this technique to clients and they choose not to do it. I measured the time a team lost fighting with ClearCase, and it averaged 15%, or roughly 6 hours per week per person.

    I also ask this of my clients: "If I could give you a last-second 10% extension on your project, would you take it?" They always say yes, but they don't spend the $5-10k it takes to get that extension. I release the outcome.

    ReplyDelete
  7. Hy
    Nice to here this story. I'm working as consultant for various customers and I always face performance problems at the customers site. Often our customers come from manufacturing or industry sector and the IT there doesn't see why developers would need better machines. I encouraged my team to do the stopwatch(ing) and the maths. When we had the numbers we got new machines in a relatively short time. We went from dual core, 4 gb RAM, IDE to i7 quad hyperthreading, 8gb RAM, intel SSD. We are now saving more than 50% runtime when running fxcop, unit tests and acceptance test. That rocks!

    Now we also need to improve the execution spped of our tests. We have a lot of potential there. And also our CI server and agents could benefit from SSDs

    Happy (fast) coding

    ReplyDelete
  8. Posted by MikeCSH on: http://news.ycombinator.com/item?id=2960319

    I've actually done exactly this exercise myself having been constantly frustrated with the development platform in use at my company (Similar Dell PCs, Win 7, VS2010, RS6 etc.). Additionally we have a slow VPN to our servers which are hosted at our main office. There have been days where the total wasted time is >25% of my day.
    I'd add an additional cost into the analysis which is the frustration of engineers working under these conditions and the risk of them leaving to find somewhere that takes this more seriously.
    I'll also register my continued surprise and disappointment at the performance of this typical MS development stack on a machine that should be more than up to the task (3 GHZ Core 2 Duo, 4 GB RAM).

    ReplyDelete
  9. Anonymous5/9/11 13:22

    I do admire your pursuit of the rational thought process, but...

    There is a fifth dimension beyond that which is known to man.
    It is a dimension as vast as space and timeless as infinity.
    It is the middle ground between light and shadow,
    between science and superstition,
    and it lies between the pit of man's fears and the summit of his knowledge.
    This is the dimension of imagination.
    It is an area which we call . . . the Corporate Purchasing Zone.

    The group that I am working in is experiencing much the same situation, we are on 32bit Win XP with a fleet of dell machines that have dodgy video drivers... Visual Studio dies at the drop of a hat... add in a well known global IT services company as the manager of desktop environment and it is a wonder anything gets done.

    ReplyDelete
  10. Anonymous5/9/11 13:52

    Hi Damian,

    thanks for this post - it reminds managers of their responsibility to design a system that works economically well.

    However, it looks to me as if your calculation contains two assumptions that are wrong.

    1) You write "Devs are the constraint on the throughput of the system (the bottleneck)." and then, you begin to calculate the daily cost per developer as £200. This is much too low because you use cost accounting instead of throughput accounting.

    Theory of Constraints says that a bottleneck is not a constraint for cost but a constraint for *throughput* of the system. So, let's change the accounting method: How large is your *throughput in pounds sterling* paid per release?

    Let's say you develop a medium sized application with your 10 developers. You release twice a year and (say) your customers pay £2 million per release. Now, what is the cost (in terms of lost throughput) if the developers get one hour behind schedule? The calculation works like this: One release = 6 months = 120 working days = 960 working hours. Your company loses £2 million / 960 hours = £2082 each hour on the bottleneck resource. If the developers really are your company's bottleneck and if you have 10 of them, each developer costs £208 per hour(!) in lost throughput.

    That was about assumption #1. (By the way: How do you know that the devs are the bottleneck and not somebody else, e.g. testers?)

    Now for the other assumption:

    2) You say "That's 10 people losing 40 mins a day, so 400 minutes a day of lost development effort.". That is the right answer to the wrong problem. You point out how much capacity your developers lose because of slow machines. If your developers were a motorway, this would translate to "lost square miles for cars". But: The real problem on a motorway is not "how many cars can you place on it?" but "how many cars can move on it per hour?" and "how long will each car spend in queues?".

    Let's translate that back to software development. Software development is a flow problem. If you allow your developers to use up the entire time of day for development, how much increase would you get in throughput? And, how much time does the work spend in queues? By how much can this queueing time be reduced by faster machines? How about limiting work in progress - could this have a higher impact than using faster machines?

    ---

    From my point of view, you need to have more flow-oriented instead of capacity-oriented data to support your point of view that faster machines would be the solution. Continue to get that data and translate it (as you did) into money, then you're on the right track.

    Cheers
    Matthias

    ReplyDelete
  11. We see the same thing doing large scale Java web app on Ubuntu and/or Fedora with IntelliJ IDEA 10, on similar machines but with 8 GB RAM. We've bought a few quad core XEON machines with 12GB DDR3 RAM and 10k to 15k RPM SAS drives and the difference is staggering. Maven builds cut in half or better, IntelliJ doesn't sit around indexing forever, etc...

    The faster SAS drives seem to be the biggest bang for your buck in our experience - just have enough RAM to avoid swapping too.

    ReplyDelete
  12. I think it's even worse: the longer it takes to finish a task, the more likely a version-control conflict, and the worse it will be. So the actual price is more than doubled when the time doubles. And, on top of this, like you said, is the unmeasurable loss of flow, which is even more expensive in difficult projects than the measurable costs.

    ReplyDelete
  13. Anonymous5/9/11 15:00

    Did you try to measure how long it takes to get back into flow after the interruptions that derail you and factor that in to productivity more broadly than just 'lost 5 minutes waiting'; it might be 'lost 5 minutes waiting, lost half an hour getting back to where we were before thoe five minutes)

    ReplyDelete
  14. Anonymous6/9/11 20:13

    Great analysis and, whatever the figures, your point is well made. FWIW, I would use a fully-rolled-up cost of around £500 per day for a dev (includes overheads) and the TCO for a "desktop" (including those procurement, deployment and support costs, disposal, etc.) being in the region of £1500.

    ReplyDelete