Obviously, I do not need to convince anyone, that good code review is crucial part of software development.
But did anyone told you how to do it properly? How do you assure that each code review which you are making includes all factors of good code review. What are those factors? I thought about those questions a while ago and I thought that it will be good sharing my findings with you.
Checkpoint list for those on-the-go:
- Can I explain purpose of each line of code which I am reviewing?
- Is the code properly formatted (braces, tabs, alignment)?
- Are variables, classes, methods named and ordered properly?
- Does the user updated/created documentation (if it was needed)?
- Is there a test for this code? Do tests cover happy, bad, corner paths?
- Can I think of any other corner path? How I can break this code?
- Does the code follow
SOLID, DRY, YAGNI, FAIL FASTpatterns?
- Can code reuse anything from existing code (and other way around)?
- Are there any performance issues introduced by the code?
- Are there any safety issues that we need to take into consideration?
Version for more curious ones, with few minutes to spare.
Text below as a whole is made of quotes from excellent e-book: “What to look for in a Code Review” by Trisha Gee.
First and foremost:
Does the code actually do what it was supposed to do?
Formatting & Style:
Where are the spaces and line breaks?
Are they using tabs or spaces?
How are the curly braces laid out? If the codebase has a mix of standards or design styles, does this new code follow the current practices?
It would be rare that new code, whether a bug fix or new feature, wouldn’t need a new or updated test to cover it.
Can I think of cases that are not covered by the existing tests? Often our requirements aren’t clearly specified. Under these circumstances, the reviewer should think of edge cases that weren’t covered in the original bug/issue/story. If our new feature is, for example, “Give the user the ability to log on to the system”, the reviewer could be thinking, “What happens if the user enters null for the username?” or “What sort of error occurs if the user doesn’t exist in the system?”.
Do the tests match the requirements?
One step beyond simply “is there a test?” is to answer the question, “is the important code actually covered by at least one test?” .
Can I understand what the tests do?
Do the tests cover a good subset of cases?
Do they cover happy paths and exceptional cases?
Is the code in the right place? For example, if the code is related to Orders, is it in the Order Service?
How does the new code fit with the overall architecture?
Does the code follow SOLID principles, Domain Driven Design?
What design patterns are used in the new code? Are these appropriate?
Could the new code have reused something in the existing code?
Does the new code provide something we can reuse in the existing code?
Does the new code introduce duplication? If so, should it be refactored to a more reusable pattern, or is this acceptable at this stage?
Is the code over-engineered? Does it build for reusability that isn’t required now? How does the team balance considerations of reusability with YAGNI?
Common problems like SQL Injection or Cross-site Scripting can be found via tools running in your Continuous Integration environment. You can also automate checking for known vulnerabilities in your dependencies via the OWASP Dependency Check tool.
One of the areas where security vulnerabilities can creep into your system or code base is via third party libraries. When reviewing code, at the very least you want to check if any new dependencies (e.g. third party libraries) have been introduced. If you aren’t already automating the check for vulnerabilities, you should check for known issues in newly-introduced libraries.
When you add a new URI or service, you should ensure that this cannot be accessed without authentication (assuming authentication is a requirement of your system). You may simply need to check that the developer of the code wrote an appropriate test to show that authentication has been applied. You should also consider that authentication isn’t just for human users with a username and password. Identity might need to be defined for other systems or automated processes accessing your application or services.
If the code under review is sending data on the wire, saving it somewhere, or it is in some way leaving your system, if you don’t know whether it should be encrypted or not, try and locate someone in your organisation who can answer that question.
Secrets are things like passwords (user passwords, or passwords to databases or other systems), encryption keys, tokens and so forth. These should never be stored in code, or in configuration files that get checked into the source control system. There are other ways of managing these secrets, for example via a secrets server. When reviewing code, make sure these things don’t accidentally sneak into your VCS.
Could the code benefit from lazy loading?
Can if statements or other logic be short-circuited by placing the fastest evaluation first?
Does this piece of functionality have hard performance requirements?
Does the code using a thread-safe data structure where it’s not required?
Is there a lot of string formatting? Could this be more efficient?
Does the code using a data structure with poor performance for the common operations? For example, using a linked list but needing to regularly search for a single item in it.
Does the code use locks to access shared resources? Could this result in poor performance or deadlocks?
Does the new code introduce avoidable performance issues, like unnecessary calls to a database or remote service?
Unnecessary network calls Like databases, remote services can sometimes be over-used, with multiple remote calls being made where a single one might suffice, or where batching or caching might prevent expensive network calls.
Calls outside of the service/application are expensive Any use of systems outside your own application that require a network hop are generally going to cost you much more than a poorly optimised equals() method. Consider: Calls to the database The worst offenders might be hiding behind abstractions like ORMs. But in a code review you should be able to catch common causes of performance problems, like individual calls to a database inside a loop.
Does the author need to create public documentation, or change existing help files?
Does the code close connections/streams? It’s easy to forget to close connections or file/network streams.
Developers who come to the code later, or who use any API exposed by your system, will make certain assumptions based on data structures. If the data returned from a method call is in a list, a developer will assume it is ordered in some fashion.
Can I understand what the code does by reading it?
Are the exception error messages understandable?
Are confusing sections of code either documented, commented, or covered by understandable tests (according to team preference)?
Reflection: in Java is slower than doing things without reflection.
Timeouts: When you’re reviewing code, you might not know what the correct timeout for an operation is, but you should be thinking “what’s the impact on the rest of the system while this timeout is ticking down?
Parallelism: Does the code use multiple threads to perform a simple operation? Does this add more time and complexity rather than improving performance?
Is the code using the correct data structure for a multi-threaded environment?