Rescuing web apps (part 2)
In the second part of this post on rescuing web applications, I'll go through some techniques for refactoring the codebase of an existing project. This is for the case where you've decided that there is hope for your project, and that the best course is to take the time to fix what's already there, as opposed to starting again from scratch.
Here are some general tips for refactoring a large codebase.
1. Run static analysis tools on your project
There's a reason that this is step 1. Can you quantify the problems with your codebase? If not, there are tools available for all of the most popular web languages that will help you to determine the most problematic areas of your code. The best way to use these tools is by tracking the change over a period of time, e.g. by setting up your project with continuous integration, running the static analysis tools on each build and keeping the output. It's a great feeling to see the health of your project tangibly improve as you refactor.
Check if there are tools available for your language that track the following metrics:
- Code coverage - absolutely essential for determining how much of your code is covered by tests
- Code smell/mess detector
- Code similarity/duplication (aka copy-paste detector)
- Cyclomatic complexity (i.e. nested if statements and loops)
With these few metrics, you can get some numerical snapshots of the health of your app. Improvements in these metrics should be fed back to the team, because it's a good source of encouragement.
2. Rank the areas that need refactoring
There are 3 factors that I take into account when deciding which part of the codebase to refactor first:
- The results of the metrics from step 1
- The most critical features of the app
- Which areas of the codebase which have historically been the buggiest or most problematic
This means that the biggest spaghetti mess of code may be relating to a fairly unimportant feature, so you may choose to write more unit tests for the super critical payment system instead.
In your team, break down each section of code that needs refactoring, and give them a rank based on their importance and current code quality. This gives you your plan of attack. When you've decided on your first task, move on to the next section.
3. Work out the scope of the problem
Is the problem purely in the quality of the code? I.e. does the feature do what it's supposed to or is it fundamentally broken? Are there working and useful tests around the feature?
If the feature doesn't work, or needs drastic modification, it may be that you need to consider rewriting that section entirely.
4. Test first
If your test suite doesn't fully cover the code that you're refactoring, now's the time to write tests. I'll assume that you're able to implement these three levels of test:
- High level end-to-end (acceptance) tests, e.g. with a tool like capybara or selenium
- Integration or functional tests, e.g. testing the behaviour of controllers or groups of objects
- Unit tests, i.e. testing individual objects
Unit tests
The most useful tests in refactoring are unit tests. It's perfectly reasonable to write unit tests that cover practically every path through your code (save for an infinite number of input values). However, I've found that it's rarely the case that we can keep the original unit tests when performing a large-scale refactor. This is because it's unlikely that your problems are confined to a single class. Often during a refactor, new classes are introduced, object APIs change and existing classes are deleted entirely. Delete the related unit tests if you know that you're going to have to completely change the classes involved, and rely on higher level tests instead.
Integration/functional tests
Integration/functional tests are the next most useful, as they will tell you whether your app is still working at a higher level as you refactor. For instance, a controller will orchestrate a series of objects and render a view. Make sure you have tests that cover it; specifically tests that check the outputs as opposed to the behaviour, since the behaviour will change during the refactor. Integration tests are less informative than unit tests, since it's often impractical to test every path through the code at this level. As more objects become involved in the stack, the number of possible inputs, outputs and paths increase exponentially. Nevertheless, an integration test is infinitely preferable to no test at all.
End-to-end acceptance tests
If, for whatever reason, you can't rely on unit tests or integration tests, make sure that you have a few end-to-end tests. A possible scenario for this would be where you have a feature that needs a total rewrite, including big UI changes that span multiple pages of your application. For instance, if your controllers, routes and models change, then you have no way of using existing integration or unit tests. Just bear in mind that the more disruptive the refactor is, the less certain you can be that you won't have introduced more bugs in the process.
In summary, the lower level tests are the most useful, and will give you quicker and more accurate feedback about the impact of your refactor. As the scope of your refactor increases you'll be forced to rely on higher level tests, which will give less certainty about the positive and negative effects.
If you don't have any tests covering the existing feature, I'd suggest writing some end-to-end tests that confirm the current behaviour. Then, if possible, write some integration tests. Work your way down from high- to low- level tests, stopping at the point where your refactor would break your new tests.
5. Test a bit more
When you finally start your refactor, you'll either be writing new tests from scratch or relying on existing tests, and ensuring that passing tests don't fail after the refactor. It (hopefully) goes without saying that new code should be thoroughly tested, to avoid ending up in the same situation as before.
The new code doesn't have to be perfect, but if it's well tested then the likelihood is that it's well-factored, and can therefore be easily manipulated in the future if necessary.
6. Manually check the result
We often place so much faith in our test suite that we cut corners when testing the app manually. Just remember this: you can write code with 100% test coverage that does absolutely nothing. Code coverage isn't a measure of feature completion, so don't skip the vital stage of manually checking that everything still works together.
7. Profit?
The benefit of a large refactor is rarely seen in the short-term, apart from in the minds and hearts of the developers working on it. However, the chances are that it will save time in the future, and therefore money.