Structure-cementing tests and how to avoid them 2/3

Part 2 - Concepts of the TestDsl

Tests should be sensitive to behavioural changes but be insensitive to structural changes. Tests that do not fulfil the second condition are called structure-cementing. In part 1, we built a TestDsl with which we can completely avoid cementing structure through the test setup. In this part, we will go into more detail about the concepts of the DSL (Domain-Specific Language [1]) and show why you can use it to write tests that become unit tests after changing just one line of integration.

The test setup is a series of steps we need to take before we can test our testee. We need this because certain preconditions are required before we can test the actual behavior. The TestDsl is an abstraction layer between test and test setup that makes the test setup very comfortable and avoid structural cementation.

Let us assume that we want to test a method rentBook(bookId, userId) in a RentingService. It is very common that this method has the precondition that the book and user must be stored in a Repository before we call rentBook. Additionally the renting user must exist, have a role and a permission called CAN_RENT_BOOK .

If you were to create all the preconditions in each test by hand, you would not only have created a lot of redundancy, but (assuming enough tests) you would also have cemented the structure of all the preconditions. Implementation details, such as what a Role looks like, how it gets into its Repository, how the RentingService gets to that Repository, are cemented with every redundant test setup. This cementation happens because an engineer simply does not want to change a high number of files in order to implement an actually sensible structural change. The engineer feels that the structure is anything but soft and more like cement.

If the entire test setup is done via the TestDsl, only the Dsl is affected by structure changes and the test is decoupled from the structure. We can make structural changes in the production code without any problems because we only have to make changes in one place, in the Dsl.

At the top are the tests. The tests only have dependencies on the Test-D.S.L. The Test-D.S.L. has dependencies on the production code. If production code changes only the D.S.L. has to change but not a single test.

Figure 1. The TestDsl inserts itself between tests and the structure of the production code

The TestDsl consists of the following parts:

TestState
Floor
Entity (Combo) Builder
Test Doubles (instead of structure-cementing mocks)
Service Configurator (optional)
JUnit extension (optional)

A noticeable amount of code, but not much logic. The code is always about delegating or setting values. This is good. The Dsl should think as little as possible so that we don’t create a maintenance problem.

Clearly, we need to invest code. However, this pays dividends quickly and allows us to write very concise tests (the final test from part 1):

Complete test with TestDsl Extension

@Integration @Test
void should_be_able_to_rent_book(TestState a, Floor floor){
    // given
    var book = a.book();
    var userCombo = a.userCombo(it -> it.hasPermission("CAN_RENT_BOOK"));
    a.saveTo(floor);

    var testee = new RentingService(floor);
    // WHEN
    var result = testee.rentBook(book, userCombo.user());
    // THEN
    assertThat(result.isRented()).isTrue();
}

We can now also turn this test into a quick @Unit test, just by replacing the @Integration annotation. Legacy code in particular benefits from this feature of the TestDsl, because we often still have a lot of logic in the database and have to test at integration level before remediation. Over time, this logic ends up in the domain and we can turn existing tests into significantly faster unit tests with a one-liner. Without a TestDsl, you would have to completely rewrite them at unit level, which is why many teams do not do this, remain stuck with slow integration tests and cannot iterate faster despite tests.

In the following chapters, we will see why this is possible and which concepts are behind the Dsl. In Part 3, we will look at the second and final type of structure-cementing tests: Tests that test unstable elements and not modules.

Unit and integration tests

Unit and integration tests are very vague terms across the industry ([2], [3]). Even in small teams there is no clear definition, everyone has their own understanding. So we should pause for a moment and define what we mean when we say @Unit @Test.

The test written in part 1 are sociable unit tests [4] and follow the unit test definition by Michael Feathers [5]:

A test is not a unit test if:

It talks to the database

It communicates across the network

It touches the file system

It can’t run at the same time as any of your other unit tests

You have to do special things to your environment (such as editing config files) to run it.

Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.

— Michael Feathers [5]

Around 2010, Google did not know this definition or their teams could not agree on it. However, they knew that it is hugely beneficial for internal communication if everyone uses the same names for the same things. They couldn’t agree on the same definitions so they introduced new 'data-driven naming conventions' for tests. Their definition of a small test is pretty close to Feathers' definition (Google Test Sizes) and the medium test provides a pretty good definition of an integration test.

Table 1. Google Test Sizes [3]
Feature	Small (Unit)	Medium (Integration)	Large (Acceptance)
Database	No	Yes	Yes
Network access	No	localhost only	Yes
File system access	No	Yes	Yes
Use external systems	No	Discouraged	Yes
Multiple threads	No	Yes	Yes
Sleep statements	No	Yes	Yes
System properties	No	Yes	Yes
Time limit (seconds)	60	300	900+

The table also shows why unit tests are so fast: there is no out-of-process with which our tested code has to interact. Everything runs in-process and in-memory and without network. The perfect basis for the majority of our tests, because the next level of integration or medium can already be significantly slower. Depending on the test runner and infrastructure, unit tests in customer projects are between 4 and 10 times faster than integration tests. We were only able to achieve a factor of 4 with our integration tests by parallelising them with a little trick.

If each test is given its own namespace in the database (in MongoDb this would be a schema), then each integration test can only see its own data and can only modify its own data. Test isolation is thus restored.

From integration to unit test

We can convert our test from @Integration to @Unit with a one-liner. The JUnit extension switches all repositories in the background. The production repository JpaBooks becomes an InMemoryBooksDouble. The api of the TestDsl remains the same, which is why we no longer need to make any changes to the test. We don’t have to change anything in the tested code either, because it only contains the interface Books { add(Book book); /* … */ } and not which implementation is behind it.

For this change to work so smoothly, however, the InMemory and Jpa repositories must also behave in the same way. In the following chapter, we will see how we can continuously ensure this with the so-called port contract tests.

However, it does not always make sense to implement all methods of Books in InMemoryBooksDouble and to keep them synchronised with port contract tests. Sometimes we need the powerful query functionalities of databases not for business logic, but for search functions in the UI. On the one hand, it would be a huge overhead to rebuild these in-memory for a few @Unit tests. On the other hand, these tests would then really only test our InMemory repository implementation. In such cases, we prefer to throw a NotImplementedException in the InMemory double and stick with @Integration tests (for now). We can always change our mind if business logic actually requires the query method.

Keeping doubles synchronised to production code with port contract tests

So far we have assumed that a Jpa- can always be replaced by an InMemory repository. This is possible because we combine the Ports & Adapters Architecture [6] with so-called port contract tests [23].

JpaBooks implements the interface Books. The interface is a so-called port. All classes that implement the interface are adapters of it. However, the domain logic only knows the ports and not which implementation is behind them. This means that we have decoupled the domain logic from what the code that communicates with the outside world actually looks like. Theoretically, an implementation of the port does not even have to exist when writing the domain logic.

The Ports & Adapters Architecture [6] helps us design better. We can model the domain logic first before we have to turn to implementation details. The architecture also offers us the option of replacing real adapters with test doubles [8] for tests. In our unit tests, we therefore use an InMemoryBooksDouble instead of a slower and more expensive JpaBooks repository.

InMemoryBooksDouble is a specific type of double, a so-called fake [9]. In contrast to the other double types (dummies, stubs, and mocks [10]), fakes are working implementations of ports that take shortcuts that the production code cannot take, in this case the InMemory solution.

In contrast to other doubles, however, the fake must fulfil the expectations that the domain code has of the port. With repositories, for example, the domain code expects that an entity that was added with add() can then also be found again with a find(). The expectations that the domain has of the port are called contract and we can check them with a port contract test.

Port Contract Test of our Port

public abstract class BooksContract { 
    abstract Books testee(); 

    @Test
    void should_remember_book(TestState a){ 
        // given
        var book = a.book();
        var testee = testee();
        // when
        testee.add(book);
        // then
        assertThat(testee.findById(book.id())).isEqualTo(book);
    }
}

	The contract is abstract. It only becomes an executable test when it is implemented.
	We only know the port in the test, not the implementation.
	Each test describes behaviour that we expect from the port.

The implementation test is very short for both the fake and the production adapter.

Test of the Port Adapter

@Unit
public class InMemoryBooksTest extends BooksContract {

    @Override
    Books testee() { 
        return new InMemoryBooks();
    }
}

Adapter tests usually only implement the method that creates the testee.

And the fake is also very easy to write thanks to a reusable base.

An InMemory fake is quick to write thanks to the base class

public class InMemoryBooksDouble
            extends BaseInMemoryDouble<BookId, Book>
            implements Books {   
}

	In most repositories, we do not need to implement any special methods here and only use what the base also has.
	Special methods are usually only created by queries. We can solve simple queries with the `filter(predicate)` method from the base class. For more complex filter methods, however, we can always say that we do not implement them and prefer to use a slower Integration Test.

The base class has little logic and always delegates to the JDK map

public abstract class BaseInMemoryDouble<TId, TEntity extends Entity<TId>> {
    private Map<TId, TEntity> entities = new HashMap<>(); 

    public List<TEntity> filter(Predicate<TEntity> predicate){
        return this.entities.values().stream()
                    .filter(predicate)
                    .toList(); 
    }

    
}

	For tests, we only need one HashMap here. However, if we also intend to test parallel code, we should use a ConcurrentHashMap straight away.
	Simple queries can be solved using predicate. For our unit tests, we don’t need anything complicated with indices because our HashMap only contains a few entities.
	Other methods such as `findById()`, `add()`, `remove()`, `removeIf()` and `count()` only pass through to the (concurrent) HashMap. We do not implement anything special here, but use what the JDK gives us.

With these tests, we can now guarantee that all adapters of the port behave in the same way. They will always be synchronised with what we define as an expectation (aka contract) in the tests.

Contract tests are an idea from J. B. Rainsberger [11]. We only call them port contract tests here to make it more explicit which contract you want to test. This also distinguishes them from the integration contract tests [12] and the consumer-driven contracts [13] approach. An alternative name for the port contract tests is role tests [14].

Structure-cementing mocks and flexible doubles

In our test, we have so far only used one form of Test Doubles [8], the InMemory Fakes [9]. In addition to the fakes, there are also stubs, spies and mocks. They are defined as follows:

Fakes [9]: are working implementations that can take shortcuts that the production code cannot take. We keep them synchronised with port contract tests. Fakes can be recognised by the fact that their implementation does roughly the same as the production implementation.
Stubs [15]: allow us to put indirect inputs into our test. Indirectly, because these inputs are not passed as parameters to the testee, but the testee pulls the inputs itself. Stubs can be recognised by the fact that we pass them test data, which they return as bluntly as possible when requested by the testee. There is no great logic here.
Mocks [16]: allow us to check indirect outputs from our testee. Indirectly, because you don’t get these outputs as a return value from the testee, but have to retrieve and verify them via detours. This is also known as behaviour verification. Mocks can be recognised by the fact that you ask the mock directly to verify whether it has been called (with certain parameters). The test calls a framework method (verify(mock).didSth(withParam)) or a self-written method (mock.verifyAddWasCalled()).

All three Test Doubles can be implemented with a mocking framework, but they can also be implemented without one. Fakes and stubs benefit from implementing them by hand. It’s not much code, you have a single implementation for multiple tests and the code is easier to read because it is just code and no framework syntax.

A mocking framework really only makes sense for mocks because it allows you to specify the expected behavior in the same location as the test. But since you only need mocks very rarely, you only need mocking frameworks very rarely. This is good because the excessive use of the framework also leads to structure cementation.

Visualizes that reimplementing the behavior of classes via mocks cements the structure of the production code.

Figure 2. Reimplementation of the same method in n tests leads to structure cementation

If we reimplement the same methods again and again in n tests, then:

we cement the design at the type level.
our reimplementation may deviate from the real code. The deviation can even be so strong that we break the encapsulation of the port [17].

The former deprives us of the possibility to change our structure. But the latter is perhaps even worse, because our test can be green with the mock, while they would be red with the actual production code. As a result, we no longer trust our tests.

In ‘The Art of Unit Testing’ [18], the recommendation is to only use mocks if we want to test the interaction with an external service. Then you only need mocks in 2% to 5% of unit tests.

For the vast majority of tests, we therefore use either no double at all (method that only calculates and we can assert on the return value), an (in-memory) fake or a stub and we then write these quickly by hand: fake or stub.

A simple stub

public class IsbnApiEchoDouble { 

    private final String bookTitleEcho;

    public SomeRemoteApiEchoDouble(String bookTitleEcho){
	    this.bookTitleEcho = bookTitleEcho != null
                                            ? bookTitleEcho
                                            : "Refactoring";
    }

    public String findTitle(Isbn isbn) {
        return this.bookTitleEcho; 
    }
}

	There are different types of stubs. This one always returns an echo of the values it received in the constructor.
	No special logic here. Just return what you got in the constructor.

Writing it yourself also gives us a single place where we can maintain structural changes to the real port without affecting the test.

Builder Design

The generic with() method accelerates the writing of the initial builder but requires public fields.

Entity-TestBuilder

public class BookBuilder extends TestBuilder<Book> {

    public BookId id = ids.next(BookId.class);
    public String title = "Refactoring";
    public String author = "Martin Fowler";
    public Instant createdOn = clock.now();

    public BookBuilder(Clock clock, Ids ids){
        super(clock, ids);
    }

    public Book build(){
        return new Book(id, title);
    }

    public BookBuilder with(Consumer<? super BookBuilder> action) {
        action.accept(this);
        return this;
    }

}

Public fields are a trade-off we can take, at least initially. Realistically we are going to want to switch to more specific withX() or isX() methods sooner rather than later for one of two reasons:

the new methods make testing more convenient.
the new methods don’t allow error conditions.

Suppose for example we make the author name no longer stringly but strongly typed [19] as AuthorName (provides more compile-time safety similar to the Ids). Then the generic with() method is no longer as convenient to use, because we always have to write:

with(it → { author = new AuthorName(‘Alistair’); })

To combat this we can introduce a withAuthor(String name) and a withAuthor(AuthorName name) overload to make our builder more convenient to use and keep our tests readable.

The second reason happens when two or more fields depend on each other. For example, when a Book gets a field rentedOn. rentedOn must always be after createdOn. With our generic with(), however, we can create an object that is invalid because we have only set rentedOn. This is not a big problem if we always validate in the constructor of a class or record whether the fields (aka the state) are correct. However, BookBuilder would then allow something, which Book then acknowledges in runtime with an IllegalArgumentException.

In order to have more compile-time safety again, we can make the field rentedOn private again in the builder and introduce isRentedOn(Duration rentedAfterCreate) together with the overload isRentedOn(Instant createdOn, Duration rentedAfterCreate). The new prefix, is, shows us that the method conceptually does something different than a with. is declares that the method sets several interdependent values. The overload shows us which value the parameter rentedAfterCreate is dependent on.

The new prefix is also there so that we can recognise whether our builder is starting to become too complex. If the number of is methods exceeds the with, then our builder is in dangerous waters.

TestDsl in combination with Spring

The JUnit extension written in part 1 can also be made compatible with @SpringBootTest. The extension only has to check whether an ApplicationContext exists. If so, it pulls the floor from the Spring DI-Container instead of from the JUnit Store.

TestDsl with Floor supplied by Spring

@Override
public Object resolveParameter(
        ParameterContext parameterContext,
        ExtensionContext extensionContext
    ) throws ParameterResolutionException {
        // ...
        var springFloor = SpringExtension
            .getApplicationContext(extensionContext)
            .getBeanProvider(Floor.class)
            .ifAvailable;
        // ...
}

Using the annotation, we can now write the test for a controller.

SpringBoot Controller Test with TestDsl

@Integration @SpringBootTest @Test 
void should_be_able_to_rent_book_via_api(
        TestState a,
        Floor floor,
        @Autowired BookController testee){ 
    // rest of test
    
}

	We combine the SpringBootTest annotation with the TestDsl annotation.
	We ask Spring to inject the `testee`.
	We can use the TestDsl here as in any other test. The repositories that Spring recognises and those of the TestDsl are the same.

If you use @SpringBootTest you have to be careful how you write your tests and how extensive they are. The Spring Application Context is cached for tests which overrides the test isolation. Modifications that a test makes can cause a test that runs later to fail. Our tests become brittle.

Unit tests should therefore test (functional domains) logic without Spring. This also corresponds to the recommendation that the Spring Framework has made since version 2 [20] and has maintained up to the current version 6 [21]. An @Integration @SpringBootTest can be added sporadically for important test paths through the application.

Low and High Level Test DSLs

The TestDsl for @Unit and @Integration shown so far is a Low-Level TestDsl. It counteracts structure cementation and makes tests 'under the hood' easier to write. Thanks to direct access to domain objects, we are very flexible as to which test states we can create. We can use it to check the happy path, the sad paths and also many strange paths, i.e. paths that should never actually occur.

However, it is not written from the user’s perspective and cannot be used to verify that the system is behaving correctly from the user’s perspective. For such tests, we need a running system that we can access from outside via a browser, HttpApi or similar. Google would call these tests ‘Large’ [3] (Google Test Sizes). Other common names are system tests or user acceptance tests.

For these tests, we need a new Dsl with a different structure but a very similar concept behind it. However, this high-level @Acceptance Dsl no longer has anything to do with structure cementation, but with Ui or Api cementation. The more tests we have that require a certain button or a certain widget, the more this UI component is cemented. In the case of a public api, this cementing is perhaps intentional, as you want to offer others a stable api. But even then, a Dsl is recommended because it makes the tests much more readable and maintainable.

The High-Level TestDsl briefly outlined below is the implementation of the 4 Layer Acceptance Test Structure by Dave Farley [22]. The 4 layers are:

top: our test
DSL per domain: renting, buying, etc.
protocol drivers: UI, API, external system stub
the system under test

When we follow this structure our tests no longer accesses the api of our system directly. There is no http.get(‘/api/users’). The test also does not click directly in the browser. There is no page.navigate() or page.click(). The test only recognises the next layer, the Dsl.

The Dsl only offers domain-specific user targets, not how the targets are technically implemented (with the renting-Dsl we could implement .findBook(‘Refactoring’).rent(), for example). It only recognises the protocol drivers and delegates the implementation to the protocol drivers.

Only the drivers know the system to be tested. The UI driver knows how to implement the targets with Playwright, for example, while the Api Protocol Driver can implement the targets using RestAssured, for example. Which driver is used is controlled by annotation.

High-Level TestDsl

@Acceptance @UiProtocol @ApiProtocol @Test 
void should_be_able_to_rent_book(InventoryDsl inventory, RentingDsl renting){
    // given
    inventory.addBook("Refactoring");   
    var book = renting.findBook("Refactoring");
    // when
    book.rent();
    // then
    assertThat(book.isRented()).isTrue();
}

	We carry out this test via the browser but also via the HttpApi.
	As with the low-level Dsl, each test must create its complete state.
	Unlike the low-level Dsl, however, this Dsl takes significantly larger steps. Creating a book can consist of many browser actions or api calls. If one of the intermediate steps fails, the Dsl aborts immediately and provides specific feedback as to which of the intermediate steps did not work.

You can also parallelise these tests in a similar way as we have done with integration tests: we can either provide a namespace per test directly in our system under test or solve this via our Dsl.

The former is possible if you build multi-tenant capability into your system right from the start. Each entity then needs an additional TenantId and you have to ensure that everyone can only see the data of their own tenant. If you now create a new tenant for each test and the test also creates all preconditions in the form of entities, then the tests are isolated from each other via the TenantId and can therefore be parallelized.

If the TenantId cannot be built directly into the system, the test data aliasing [22] mentioned by Dave Farley is used. With this pattern, the TestDsl itself ensures that the test data is unique. It then adds a test-unique key to fields. addBook(“Refactoring”) does not create the book “Refactoring”, but the book “Refactoring dbac1q23”. findBook(“Refactoring”) does not search for “Refactoring”, but for “Refactoring dbac1q23”. When writing the test, however, you must be careful not to assert the number of books or similar, as this could change continuously due to tests running in parallel.

Overall, the high-level Dsl described here complements the low-level Dsl with the user view. We write the majority of the tests with the low-level Dsl; we test critical application areas in particular with the high-level Dsl from the user’s perspective.

Outlook

The TestDsl combines existing concepts such as builders, ports [6], port contract tests [7], stubs [10] and fakes [9] and provides a standardized api for all our unit and integration tests. With the TestDsl, we were able to solve structure cementation through redundant test setup. We will show how we use the TestDsl to prevent structure cementing through tests at the wrong level in Part 3.

If you are interested in more content about the topic, you can view TestDsl sample code online [24] or watch the presentation on “Beehive Architecture” [25], which also revolves around the TestDsl.

This article was originally published in Java Aktuell 5/24 in 🇩🇪. It is translated and republished here with the magazine’s permission.

References

[1] M. Fowler, „Domain Specific Language“. 2008. Available here: https://martinfowler.com/bliki/DomainSpecificLanguage.html
[2] M. Fowler, „On the Diverse And Fantastical Shapes of Testing“. 2021. Available here: https://martinfowler.com/articles/2021-test-shapes.html
[3] S. Stewart, „Test Sizes“. 2010. Available here: https://testing.googleblog.com/2010/12/test-sizes.html
[4] M. Fowler, „Unit Test“. 2014. Available here: https://martinfowler.com/bliki/UnitTest.html#SolitaryOrSociable
[5] M. Feathers, „A Set of Unit Testing Rules“. 2005. Available here: https://www.artima.com/weblogs/viewpost.jsp?thread=126923
[6] A. Cockburn, „Hexagonal architecture“. 2005. Available here: https://alistair.cockburn.us/hexagonal-architecture/
[7] R. Gross, „Contract Tests in Kotlin“. 2020. Available here: http://richargh.de/posts/Contract-Tests-in-Kotlin
[8] G. Meszaros, „Test Double“. 2011. Available here: http://xunitpatterns.com/Test%20Double.html
[9] G. Meszaros, „Fake Object“. 2011. Available here: http://xunitpatterns.com/Fake%20Object.html
[10] M. Fowler, „Mocks Aren’t Stubs“. 2007. Available here: https://martinfowler.com/articles/mocksArentStubs.html
[11] J. B. Rainsberger, „Getting Started with Contract Tests“. 2017. Available here: https://blog.thecodewhisperer.com/permalink/getting-started-with-contract-tests
[12] M. Fowler, „Integration Contract Test“. 2011. Available here: https://martinfowler.com/bliki/ContractTest.html
[13] I. Robinson, „Consumer-Driven Contracts: A Service Evolution Pattern“. 2006. Available here: https://martinfowler.com/articles/consumerDrivenContracts.html
[14] M. Rivero, „Role tests for implementation of interfaces discovered through TDD“. 2022. Available here: https://codesai.com/posts/2022/04/role-tests
[15] G. Meszaros, „Test Stub“. 2011. Available here: http://xunitpatterns.com/Test%20Stub.html
[16] G. Meszaros, „Mock Object“. 2011. Available here: http://xunitpatterns.com/Mock%20Object.html
[17] M. Seemann, „Stubs and mocks break encapsulation“. 2022. Available here: https://blog.ploeh.dk/2022/10/17/stubs-and-mocks-break-encapsulation/
[18] R. Osherove, „Art of Unit Testing (3. Edition)“. 2024. Available here: https://www.artofunittesting.com/
[19] T. Spring, „Stringly Typed vs Strongly Typed“. 2022. Available here: https://www.hanselman.com/blog/stringly-typed-vs-strongly-typed
[20] T. Spring, „Unit Testing“. 2006. Available here: https://docs.spring.io/spring-framework/docs/2.0.4/reference/testing.html#unit-testing
[21] T. Spring, „Unit Testing“. 2022. Available here: https://docs.spring.io/spring-framework/docs/6.0.0/reference/html/testing.html#unit-testing
[22] D. Farley, „Acceptance Testing for Continuous Delivery [#AgileIndia2019]“. 2019. Available here: https://www.youtube.com/watch?v=Rmz3xobXyV4
[23] R. Gross, „Contract Tests in Kotlin“. 2020. Available here: http://richargh.de/posts/Contract-Tests-in-Kotlin
[24] R. Gross, „TestDsl (Avoid structure-cementing Tests)“. 2024. Available here: https://github.com/Richargh/testdsl
[25] R. Gross, „Beehive Architecture“. 2023. Available here: http://richargh.de/talks/#beehive-architecture

Structure-cementing tests and how to avoid them 2/3

Related Posts