The case for snapshot testing

Monday. January 10, 2022 - 18 mins

I’ve first heard of the term ‘snapshot testing’ when I began working with the JavaScript libraries React and jest. React is used to create more or less complex HTML DOMs. In order to unit test the DOM creation, we employed snapshot testing: We let the code under test produce its actual output and then compare that output against a persisted version from a prior test execution. If a test is executed the first time, no comparison happens but the output is initially persisted as base line for all subsequent test executions.

This technique provides a heavy guard against accidental code changes: when your code under test, for some reason, creates a different object, it will not match the snapshot any more and the test will fail. A sophisticated test framework implementation would then provide you with a detailed diff of the expected (=persisted snapshot) and the actual test result.

I liked this approach a lot because it was so easy to get highly valuable assertions, so I wanted to use it for my java projects as well.

Of course, you can achieve the same with ‘classic’ unit tests and using plain assertions. Let’s examine a little example to see how snapshot testing can bring in value to any test suite.

Classic unit tests with assertions

Assume we have some business logic that creates a Person object. For the sake of simplicity in this example its just a static one, but in reality it could be loaded from a database or be determined from user input:

public Person determinePerson() {
    return new Person()
            .setName("Simon")
            .setSurname("Taddiken")
            .setBirthdate(LocalDate.of(1777, 2, 12))
            .setAddress(new Address()
                    .setCity("Bielefeld")
                    .setCountry("Germany")
                    .setStreet("Gibtsnicht-Straße")
                    .setNumber("1337")
                    .setZipCode("4711"));
}

Let’s write a unit test which expects our business logic to return a certain Person object:

@Test
void testDetermineCorrectPerson() {
    final Person actual = codeUnderTest.determinePerson();
    
    final Person expected = new Person()
            .setName("Simon")
            .setSurname("Taddiken")
            .setBirthdate(LocalDate.of(1777, 1, 12))
            .setAddress(new Address()
                    .setCity("Bielefeld")
                    .setCountry("Germany")
                    .setStreet("Gibtsnicht-Straße")
                    .setNumber("1337")
                    .setZipCode("4711"));
    
    assertThat(actual).isEqualTo(expected);
}

The code under test produces some Person and we compare it using equals against a hard coded instance. Of course, this only works if all the involved objects and attribute types correctly implement equals. This is not necessarily the case with more complex data models or objects that are not under our control.

Now, what if the test fails? The failure message will contain some details about the compared objects, but only if all involved objects properly implement toString which, again, is not necessarily the case for every data model.

Expecting:
 <Name: Simon
Surname: Taddiken
Birthdate: 1777-02-12
Address: Street: Gibtsnicht-Straße
Number: 1337
Zip: 4711
City: Bielefeld
Country: Germany

>
to be equal to:
 <Name: Simon
Surname: Taddiken
Birthdate: 1777-01-12
Address: Street: Gibtsnicht-Straße
Number: 1337
Zip: 4711
City: Bielefeld
Country: Germany

>

It’s hard to spot the difference right away and it is impossible if the objects do not properly implement toString:

Expecting:
 <Person@df470986>
to be equal to:
 <Person@df471146>

One might argue, especially for large objects, that we should write separate tests/assertions for separate attributes of the objects. In the extreme, one test case per attribute:

    @Test
    void testDetermineCorrectCity() throws Exception {
        final Person actual = codeUnderTest.determinePerson();

        assertThat(actual.getAddress().getCity()).isEqualTo("Bielefeld");
    }

    @Test
    void testDetermineCorrectStreet() throws Exception {
        final Person actual = codeUnderTest.determinePerson();

        assertThat(actual.getAddress().getStreet()).isEqualTo("Gibtsnicht-Straße");
    }

When one test fails, we know exactly what is wrong, because each test has exactly one assertion. But you see where this is going: for large objects with hundreds of attributes, you need to write hundreds of test cases. You could combine multiple assertions into one test case, but that is discouraged because only the assertions up until the first failing one are executed which might hide further problems.

There is also another, not so obvious problem with the test cases shown so far: they heavily depend on the structure of the data under test. Let’s suppose the structure changes and our Person now has two addresses:

public Person determinePerson() {
    return new Person()
            // ...
            .setLivingAddress(new Address()
                    .setCity("Bielefeld")
                    .setCountry("Germany")
                    .setStreet("Gibtsnicht-Straße")
                    .setNumber("1337")
                    .setZipCode("4711"))
            .setWorkingAddress(...);
}

In either style of ‘classic’ unit testing shown here, you’d have to adjust a whole bunch of test cases: either you need to correct all the single assertions from using .getAddress to .getLivingAddress and add a lot of new tests for .getWorkingAddress or you need to enhance all the hard coded expected objects to also contain the expected workingAddress so that the equals comparison works again.

To summarize:

Testability depends on whether all objects properly implement equals
Failing test results are only helpful if all objects properly implement toString
Tests are highly dependent on the structure of the data under test. If structure changes, a lot of changes must be made to a lot of test cases
When using the one test case per assertion approach, you can not easily tell from the tests how a complete valid result of your code under test looks like

In my experience, the key problem when writing tests for code that deals with complex data models is to define the expected output. You can write builder classes but they are still cumbersome to use when you have a lot of nested objects. And the approach falls apart when there are structural changes and you have to adjust a lot of builder invocations.

Now…, what if the test could generate its expected output itself?

Snapshots to the rescue

This might sound like a mad idea at first: taking the test’s output as base line for comparing subsequent test results against. But if you think about it, it really isn’t mad. Unit tests are there to guard against accidental changes and not necessarily to verify that your code is correct from the beginning.

Consider the following workflow of writing code and test cases:

Write the code that produces a complex object as result
Write a unit test in which you serialize the complex test result into a json string
If this is the first execution of the test, you write this json string to a file
Important step: manually examine the file to determine whether your code produced the correct results. Otherwise, manually modify the json to contain the expected results
Every subsequent execution of the test case compares the json of the actually produced object against the persisted json in the file
If the comparison fails, a detailed difference of the expected and the actual json string is provided

If you follow this approach, you only write a single assertion but you’ll get a test case that is as valuable as hundred test cases. You also get the additional benefit of your test cases becoming independent from the actual data’s structure. If structure changes, you just need to delete the persisted snapshot and let the test regenerate the output file. The main benefit: You do not have to manually set up the complex expected result object. You let the test generate one, check once whether it is correct and then use it as base line for all future test executions.

There is an additional benefit: The persisted snapshot files contain complete, valid results of your API, thus making their structure and contents visible to anyone working with the code base. Especially when writing web services which return structured json/xml data anyway, you can have real responses of your service as snapshots. These files can be used to explain your automated tests to your product owners: “If you provide these inputs, this will be the result” – and they will immediately understand it and gain trust in the automated test (because they know how real responses are supposed to look like).

The described workflow is a bit tedious to implement yourself, as you’d have to deal with a lot of technical details and edge cases (serialization, comparison and diffing, where to store snapshots, automatic update of snapshots, determining orphaned snapshots, etc.). It would be nice to have a library deal with all the nasty details and integrate it with your favorite unit testing framework.

And here it is: https://github.com/skuzzle/snapshot-tests

Let’s see how to write some simple snapshot assertions in java using JUnit5:

import static de.skuzzle.test.snapshots.data.json.JsonSnapshot.json;    // import json snapshot format

@EnableSnapshotTests                                                    // enable the extension
public class SnapshotsTest {

    @Test
    void testAsJsonTextCompare(Snapshot snapshot) throws Exception {    // inject a Snapshot instance into the test
        final Person actual = codeUnderTest.determinePerson();
        snapshot.assertThat(actual).as(json).matchesSnapshotText();     // perform the assertion
    }
}

There are two possibilities for comparing snapshots. The first one (.matchesSnapshotText()) is to do a comparison of the serialized strings and to print a diff of the actual and expected results.

org.opentest4j.AssertionFailedError: Stored snapshot doesn't match actual result.
Unified diff:
{
  "name" : "Simon",
  "surname" : "Taddiken",
  "birthdate" : "1777-0-[2]+[1]-12",
  "address" : {
    "street" : "Gibtsnicht-Straße",
    "number" : "1337",
    "zipCode" : "4711",
    "city" : "Bielefeld",
    "country" : "Germany"
  }
}

This diff shows all the mismatches between expected and actual result in a clear way. If the results differ in multiple places, you can easily see them all at once (compare this to the test using assertThat(...).isEqualTo(...) which only tells you that something is wrong, but not what).

Snapshot comparison is aware of the structured data format that you use. Instead of doing a string comparison we can also resort to explict structure comparison using dedicated libraries like XmlUnit or JSONAssert. Those libraries might offer even better failure analysis or fine tuning of the comparison. If we adjust the assertion to this:

        snapshot.assertThat(actual).as(json).matchesSnapshotStructure();

We will get a failure message that is provided by the JSONAssert library:

org.opentest4j.AssertionFailedError: birthdate
Expected: 1777-02-12
     got: 1777-01-12

Have a look at the linked GitHub repository for further and more detailed usage instructions.

Dealing with random values

Random values like dates or UUIDs in the test results pose a problem to this approach. Those might change between every test execution causing the snapshot to assertion fail. There are multpile ideas of how to overcome this challenge:

Design your code in a way that you can mock away those random values. For example, instead of using LocalDateTime.now() use LocalDateTime.now(clock) and exchange the clock with a constant one in the test. Instead of directly using UUID.randomUUID(), create strategy interface which can be exchanged with a deterministc mock during the test.

Another idea is to normalize the result data in a consistent way before you pass it to the assertion or even during the serialization by providing a custom SnapshotSerializer. This can become tedious though, depending on the complexity of your data. Here is a real life example which compares the output of a prometheus scrape endpoint (this is a very simple, line based format) using a snapshot assertion. The original output contains a few random values and is unordered. In order to produce a canonical snapshot, the few known lines containing random values are removed and the remaining ones are being sorted. This approach actually weakens the assertion because it doesn’t operate on a valid scrape result anymore, but it is still strong enough to be a good unit test and to guard against accidental changes:

Serializer which normalizes the snapshot:

class CanonicalPrometheusRegistrySerializer implements SnapshotSerializer {

    private final static Pattern CREATED_GAUGE = Pattern.compile("^.+_created\\{.*");
    private final static Pattern SCRAPE_DURATION = Pattern.compile("github_scrape_duration_sum\\{.*");

    public static SnapshotSerializer canonicalPrometheusRegistry() {
        return new CanonicalPrometheusRegistrySerializer();
    }

    @Override
    public String serialize(Object testResult) throws SnapshotException {
        return reorderAndFilter(testResult.toString());
    }

    private String reorderAndFilter(String s) {
        return s.lines()
                .filter(this::filterCreatedGauge)
                .filter(this::filterScrapeDuation)
                .sorted()
                .collect(Collectors.joining("\n"));
    }

    private boolean filterCreatedGauge(String line) {
        return !CREATED_GAUGE.matcher(line).matches();
    }

    private boolean filterScrapeDuation(String line) {
        return !SCRAPE_DURATION.matcher(line).matches();
    }

}

The actual assertion:

snapshot.assertThat(response.getBody())
                            .as(canonicalPrometheusRegistry())
                            .matchesSnapshotText()

You should see that this approach can easily become a nightmare if you have to maintain a lot of such exceptional behavior.

The third option is to provide a StructualAssertions implemenation that is aware of those random values and explicitly ignores them during comparison. For example, the XmlUnit library comes with a areSimilar() assertion that could be used instead of areIdentical().

The library doesn’t provide a preferred or standard way of handling this challenge. I feel that the mocking approach is the cleanest one and post processing the test result feels more like an anti pattern to me. Still I think this problem deserves to be worked on. It is likely the biggest stone in the way preventing you from adopting snapshot assertions in your tests.

But wait, that is not TDD

You have a point here, this workflow obviously contradicts the ideas of TDD: You need to have your implementation ready in order for it to produce the initial snapshot. You could try though, to manually come up with a snapshot before you start your implementation. But this approach has questionable chances of success. There are just too many moving parts that influence how the final snapshot will really look like (also, you lose the main benefit of the test generating the complex data for you).

Here is my take: It is not TDD and it is not meant to be TDD. Test Driven Development is just one tool in your testing toolbox which is useful in certain situations: it allows you to take small steps in a red - green - red cycle, but only if you need that. You can of course take bigger steps if you are writing less complicated code or if you are already sure how your API is going to look like (see also this short paragraph with my personal take on TDD). In the same sense, snapshot testing gives you a different tool to use in different situations: when dealing with complex objects, snapshot testing can immensely reduce the cost of guarding your code against accidental changes.

This approach is not contradictory to TDD, it is complementary.

And: once you have your first real snapshot, you can actually work in a TDD way from here: if requirements to your implementation change, you can of course manually modify the snapshot to reflect the new requirements before you start with your implementation.

Taking it further

If a test produces complex objects it is likely that its input is also a complex object. At least that is the case in many of my professional projects, where we mostly implement mapping from arbitrary internal data formats into standardized XML structures. If we can store the complex result as file, we could just as well store the complex input as file right next to the expected output. Then we can iterate over all input files, run the test case against each input file’s contents and compare the test results against the snapshot.

The above library comes with an extra module which contains a Junit5 ArgumentProvider that can be combined with the snapshot assertions. The provider lists all files in a specified directory and provides them as input to the test case:

@EnableSnapshotTests(snapshotDirectory = "test-input")
public class DirectoryParameterTest {

    @ParameterizedTest
    @FilesFrom(directory = "test-input", extensions = "txt")
    void test(TestFile testFile, Snapshot snapshot) throws IOException {
        // Given
        final String testInput = testFile.asText(StandardCharsets.UTF_8);

        // When
        final String actualTestResult = transform(testInput);

        // Then
        snapshot.named(testFile.name())
                .assertThat(actualTestResult)
                .asText()
                .matchesSnapshotText();
    }
}

The test is executed for each .txt file in the test-input directory. A pointer to that file is provided to each test execution as parameter of type TestFile. The test then uses the file’s contents as input for the code under test (transform(...)) and the file’s name as name for the produced snapshot. The snapshot will be written with .snapshot extension into the same directory and thus be placed right next to the input file.

You can post any comments to this Twitter thread