> Mock where the object is used, not where it’s defined.
For anyone looking for generic advice, this is a quirk of python due to how imports work in that language (details in the linked post) and shouldn't be considered universal.
But what if you just passed in the contents of the file or something?
Edit: oh wait actually this is what the very last line in the blog post says! But I think it should be emphasized more!
Don't test the wrong things; if you care about some precondition, that should be an input. If you need to measure a side effect, that should be an output. Don't tweak global state to do your testing.
As such I disagree. Global state is what you should be testing - but you need to be smart about it. How you setup and verify global state matters. Don't confuse global state above with global state of variables, I mean the external state of the program before and after, which means network, file, time, and other IO things.
Again I've heard "but what if my database/table changes so rapidly that I need the mock so I don't need to change the query all the time", in which case you ought to take a moment to write down what you're trying to accomplish, rather than using mocks to pave over poor architectural decisions. Eventually, the query fails and the mock succeeds, because they were completely unrelated.
So far I've only seen mocks fail eventually and mysteriously. With setups and DI you can treat things mostly as a black box from a testing point of view, but when mocks are involved you need surgical precision to hit the right target at the right time.
I haven't seen mocks fail mysteriously. I've seen them fail often though because requirements change and instead of updating the callers (generally a small number) you end up with 200 tests failing and give up because updating all the tests is too hard. Mocks are always about implementation details - sometimes you have no choice, but the more you can test actual behavior the better.
In my experience global state is the testing bug farm. Tests that depend on global state are usually taking dependencies they aren’t even aware of. Test initializations grow into complex “poke this, prod that, set this value to some magic number” setups that attempt to tame the global state but as global state grows, this becomes more and more difficult and inconsistent. Inter-test dependencies sneak in, parallelism becomes impossible, engineers start turning off “flaky” tests because they’ve spent hours or days trying to reproduce failures only to eventually give up.
This sort of development is attractive when starting up a project because it’s straightforward and the testing “bang for the buck” is high. But it degrades quickly as the system becomes complex.
> Instead of mocking your database call to always return "foo" when the word "SELECT" is in the query, insert a real "foo" in a real test database and perform a real query.
Consider not sprinkling “select” statements throughout your code instead. This tight coupling makes good testing much more difficult (requiring the “set all the global state” model of testing) but is also just generally not good code structure. The use of SQL is an implementation detail that most of your code shouldn’t need to know about.
A thin layer around the DB interactions gives you a smaller set of code that needs to be tested with state, gives you a scoped surface area for any necessary mocking, makes it much easier of you need to change storage systems, and also gives you a place that you can reason over all the possible DB interactions. This is just good separation of concerns.
I tamed our inter-set dependencies by doing things like starting dbus on a different port for each tests - now I can test with real dbus in the loop and my tests are fast and isolated. We have a strict rule of what directories we are allowed to write to (embedded system - the others are read only in production), so it is easy to point those to a temp dir. It was some work to set that up, but it tames most of your issues with global state and allows me to verify what really counts: the system works.
For a CRUD web app your database separation of concerns makes sense. However in my domain we have lots of little data stores and nobody else needs access to that store. As such we put it on each team to develop the separation that makes sense for them - I don't agree with all their decisions, but they get to deal with it.
Tests that work and verify the system works are the Pri0 requirement. Most of the conversations about how best to test are structured for the benefit of people who are struggling with meeting the Pri0 because of maintainability. With enough effort any strategy can work.
> However in my domain we have lots of little data stores and nobody else needs access to that store.
If the little data stores are isolated to small individual areas of code then you probably already have the necessary isolation. Introducing the lightweight data store isolation layer might be useless (or not, context dependent). Now if these individual areas are doing things like handing off result sets to other code then I would have something different to say.
Rarely should a mock be “interacting with the underlying code”, because it should be a dead end that returns canned data and makes no other calls.
If your mock is calling back into other code you’ve probably not got a mock but some other kind of “test double”. Maybe a “fake” in Martin Fowler’s terminology.
If you have test doubles that are involved in a bunch of calls back and forth between different pieces of code then there’s a good chance you have poorly factored code and your doubles are complex because of that.
Now, I won’t pretend changes don’t regularly break test doubles, but for mocks it’s usually method changes or additions and the fix is mechanical (though annoying). If your mocks are duplicating a bunch of logic, though, then something else is going on.
A test should not fail when the outputs do not change. In pursuit of this ideal I often end up with fakes (to use Martin Fowler's terms) of varying levels of complexity, but not "mocks" as many folks refer to them.
[0] - https://docs.python.org/3/library/unittest.mock.html#unittest.mock.Mock.assert_called_once
There are some specific cases, such as validating that caching is working as expected, where it can make sense to fully validate every call. Most of the time, though, this is a pointless exercise that serves mostly to make it difficult to maintain tests.
It can sometimes also be useful as part of writing new code, because it can help validate that your mental model for the code is correct. But it’s a nightmare for maintenance and committing over-constrained tests just creates future burden.
In Fowler terminology I think I tend to use Stubs rather than Mocks for most cases.
A particularly complex fake can even be unit-tested, if need be. Of course, if you're writing huge fakes, there's probably something wrong with your architecture, but I feel like good testing practices should give you options even when you're working with poorly architected code.
In the example the author walks through, a cleaner way would be to have the second function take the Options as a parameter and decouple those two functions. You can then test both in isolation.
Note that I said test doubles. Mocks are a bit over specific - they are about verifying functions are called at the right time with the right arguments, but the easy ability to set return values makes it easy to abuse them for other things (this abuse is good, but it is still abuse of the intent).
In this case you want a fake: a smart service that when you are in a test setups a temporary directory tree that contains all the files you need in the state that particular test needs, and destroys that when the test is done (with an optional mode to keep it - useful if a test fails to see debug). Depending on your situation you may need something for network services, time, or other such things. Note that in most cases a filesystem itself is more than fast enough to use in tests, but you need isolation from other tests. There are a number of ways to create this fake, it could override open, or it could just be a GetMyProgramDir function that you override are two that I can think of.
That means the test environment needs to be defined and versioned with the code.
Valgrind is a mock of standard library/OS functions and I think its existence is a good thing. Simulating OOM is also only possible by mocking stuff like open.
If the code's running in a space shuttle, you probably want to test that path.
If it's bootstrapping a replicated service, it's likely desirable to crash early if a config file couldn't be opened.
If it's plausible that the file in question is missing, you can absolutely test that code path, without mocking open.
If you want to explicitly handle different reasons for why opening a file failed differently, by all means, stress all of that in your tests. But if all you have is a happy path and an unhappy path, where your code doesn't care why opening a file failed, all you need to test is the case where the file is present, and one where it is not.
Modifying the file system's implementation would be. Including a valid_testdata.txt and an invalid_testdata.txt file in your test's directory, however, is not 'modifying the file system', any more than declaring a test input variable is 'mocking memory access'.
> don't want my daemons or user-facing applications to just crash, when a file is missing
If the file is important, it's the best kind of thing you can do when implementing a non-user-facing service. The last thing you want to do is to silently and incorrectly serve traffic because you are missing configuration.
You want to crash quickly and let whatever monitoring system you have in place escalate the problem in an application-agnostic manner.
That is mostly wrong.
Valgrind wraps syscalls. For the most part it just checks the arguments and records any reads or writes to memory. For a small number of syscalls it replaces the syscall rather than wrapping it (for instance calls like getcontext where it needs to get the context from the VEX synthetic CPU rather than the real CPU).
Depending on the tool it can also wrap or replace libc and libpthread functions. memcheck will replace all allocation functions. DRD and Helgrind wrap all pthread functions.
$ cat test.c
void main (void) {
malloc (1000);
}
$ make test
cc test.c -o test
$ valgrind --leak-check=full --show-leak-kinds=all -s ./test
Memcheck, a memory error detector
Command: ./test
HEAP SUMMARY:
in use at exit: 1,000 bytes in 1 blocks
total heap usage: 1 allocs, 0 frees, 1,000 bytes allocated
1,000 bytes in 1 blocks are still reachable in loss record 1 of 1
at 0x483877F: malloc (vg_replace_malloc.c:307)
by 0x109142: main (in test.c:2)
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 1,000 bytes in 1 blocks
suppressed: 0 bytes in 0 blocks
> vg_replace_malloc.c:307What do you think that is? Valgrind tracks allocations by providing other implementations for malloc/free/... .
Mostly it wraps system calls and library calls. Wrapping means that it does some checking or recording before and maybe after the call. Very occasionally it needs to modify the arguments to the call. The rest of the time it passes the arguments on to the kernel or libc/libpthread/C++ lib.
There are also functions and syscalls that it needs to replace. That needs to be a fully functional replacement, not just looking the same as in mocking.
I don’t have any exact figures. The number of syscalls varies quite a lot by platform and on most platforms there are many obsolete syscalls that are not implemented. At a rough guess, I’d say there are something like 300 syscalls and 100 lib calls that are handled of which 3/4 are wrapped and 1/4 are replaced.
Sorry that wasn't my intention. You are a Valgrind developer? Thanks, it's a good project.
It seems like I have a different understanding of mocking than other people in the thread and it shows. My understanding was, that Valgrind provides function replacements via dynamic linking, that then call into the real libc. I would call that mocking, but YMMV.
I like Hynek Schlawak's 'Don’t Mock What You Don’t Own' [1] phrasing, and while I'm not a fan of adding too many layers of abstraction to an application that hasn't proved that it needs them, the one structure I find consistently useful is to add a very thin layer over parts that do I/O, converting to/from types that you own to whatever is needed for the actual thing.
These layers should be boring and narrow (for example, never mock past validation you depend upon), doing as little conversion as possible. You can also rephrase the general purpose open()-type usage into application/purpose-specific usages of that.
Then you can either unittest.mock.patch these or provide alternate stub implementations for tests in a different way, with this this approach also translating easily to other languages that don't have the (double-edged sword) flexibility of Python's own unittest.mock.
It does. And this is exactly the problem, here!
> TFA: The thing we want to avoid is opening a real file
No! No, no, no! You do not 'want to avoid opening a real file' in a test.
It's completely fine to open a real file in a test! If your code depends on reading input files, then your test should include real input files in it! There's no reason to mock any of this. All of this stuff is easy to set up in any unit test library worth it's salt.
That's okay for testing some branches of your code. But not all. I don't want to have to actually crash my hard drive to test that I am properly handling hard drive crashes. Mocking[1] is the easiest way to do that.
[1] For some definition of mock. There is absolutely no agreement found in this space as to what the terms used mean.
Then have your main function take in that json as a parameter (or class wrapping that json).
Then your code becomes the ideal code. Stateless and with no interaction with the outside world. Then it's trivial to test just like and other function that is simple inputs translated outputs (ie pure).
Every time you see the need for a mock, you're first thought should be "how can I take the 90% or 95% of this function that is pure and pull it out, and separate the impure portion (side effects and/or stateful) that now has almost no logic or complexity left in it and push it to the boundary of my codebase?"
Then the complex pure part you test the heck out of, and the stateful/side effectful impure part becomes barely a wrapper over system APIs.
If you introduce a mocking library to the test portion of the codebase, most developers will start to use it as a way to shortcut any refactoring they don't want to do. I think articles like this that try to explain how to better use mocks in tests are useful, although I wish they weren't necessary.
In practice the issues I see with this are that the "side effect" part is usually either: extensive enough to still justify mocking around testing it, and also intertwined enough with your logic to be hard to remove all the "pure" logic. I rarely see 90-95% of functions being pure logic vs side effects.
E.g. for the first, you could have an action that requires several sequenced side effects and then your "wrapper over APIs" still needs validation of calling the right APIs in the right oder with the right params, for various scenarios. Enter mocks or fakes. (And sometimes people will get clever and say use pubsub or events for this, but... you're usually just making the full-system-level testing there harder, as well as introducing less determinism around your consistency.)
For the second, something like "do steps I and J. If the API you call in step J fails, unwind the change in I." Now you've got some logic back in there. And it's not uncommon for the branching to get more complex. Were you building everything in the system from first principles, you could try to architect something where I and J can be combined or consolidated in a way to work around this; when I and J are third party dependencies, that gets harder.
For instance, I once worked on payment-related code at a large online retailer. The steps I and J from your example would have been calls to the payment gateway's API (payment initiation, actual payment request). There was also a step K (polling for payment confirmation) and even a step K' (a payment confirmation callback the gateway might or might not call before or after we get around polling for the payment status ourselves). And often there was even user interaction in between (the 3DS/3DS2 credit card payment scheme that's common here in the EU). Every single one of those steps could fail for a myriad of reasons, e.g. time out, be rejected, … and we had to make sure we always failed gracefully and, most importantly, didn't mess up our payment or order records.
Of course this was an old enterprise Java code base, created by people who had long left the company, and all this had been written just the way you imagine it. It was an absolute mess.
Every single time I worked on this code base I secretly wished one of the original authors had heard of state machines, pure vs. effectful code, and unit tests.
It should be obvious, but this is not something that seem to be thought in school or in most workplaces, and when it is, it's often through the lens of functional programming, which most just treat as a curiosity and not a practical thing to use at work. So I started to teach this simple design principle to all my junior dev because this is something that is actually quite easy to implement, does not need a complete shift of architecture/big refactor when working on existing code, and is actually practical and useful.
Separating I/O from logic makes a lot of sense and makes tests much easier to write and code much easier to reason about, but you'll still need to implement some sort of mocking interface if you want to catch I/O problems.
They addressed this concern already. These are not contradicting approaches.
You're just describing dependency injection, but if you say that, people won't want to listen cause doing that all the time sucks.
in the ideal case my tests start by writing some randomised data using the external API, I then update it(if applicable) using the external API, and finally read it, also using the external API, and compare the actual result with what I expected.
I use randomised data to avoid collisions with other tests, which might cause flakiness and/or prevent running the tests concurrently. I avoid having seed data in the database if at all possible.
It's the only approach I've found that can survive a major refactor of the codebase. Anything short of breaking the external API, which is typically a no-no anyway, shouldn't break these tests.
Doing a refactor and being able to rely on the test suite for finding bugs and inconsistencies is amazing. Of course they won't find 100% of all bugs,but this way at least you know that a failing test means there's a problem in your production code.
In all seriousness, I have found this to be a useful suggestion, because the purpose of a test is to make sure invariants don't break in real code. When you mock the database, you're excluding large amounts of real code from test.
Say you've got a function that accesses a key-value store. Ideally, you can factor out the i/o so that you do all your reads up front and all your writes at the end, leaving a pure function in the middle. But if the code is too tangled up in its side effects for that, the next best thing is to create a fake KV store and then wrap the function like this:
def doTest(input, initialState):
kv = makeFake(initialState)
result = doThing(kv, input)
return result, kv.dumpContents()
doThing isn't a pure function, but doTest is. Now you can write tests like this: result, outputState = doTest(input, initialState)
assert (result, outputState) == expected
I guess you could call that "imperative core, functional shell," lol.You still need to write a test for how it all comes together and you should write tests for your error handling. You need a mock to respond with an error.
value, err := externalLibraryFunctionFoo(a, b)
if err != nil {
return nil, fmt.Errorf("calling foo: %w", err)
}
then you probably don't need to test it. All you're doing is bubbling up the error handling from other libraries' functions.What if someone in the future is comes in and modifies that to add some complexity somehow or changes it to log and continue. Tests will catch that behavior
def get_user_settings() -> str:
with open(Path("~/settings.json").expanduser()) as f:
return json.load(f)
def add_two_settings() -> int:
settings = get_user_settings()
return settings["opt1"] + settings["opt2"]
and the very first comment just below>>> The thing we want to avoid is opening a real file
and then the article goes and goes around patching stdlib stuff etc.
But instead I would suggest the real way to test it is to actually create the damn file, fill it with the "normal" (fixed) content and then run the damn test.
This is because after years of battling against mocks of various sort I find that creating the "real" resource is actually less finicky than monkeypatching stuff around.
Apart from that; yeah, sure the code should be refactored and the paths / resources moved out of the "pure logical" steps, but 1) this is an example and 2) this is the reality of most of the actual code, just 10x more complex and 100x more costly to refactor.
You can create an actual mock networked service but it's much more work.
I think this is an example explaining what seems like a good practice for using mocks in python to me, the actual code in the post is barely "supporting cast".
I use a browser extension for scraping actual backend responses, which downloads them with a filename convention the mock server understands. I mostly use it for development, but also for setting up screenshot tests. For example,
PATCH /select
'api/user(locked-out).GET.423.json'
screenshot the app and pixel diff it PATCH /select
'api/user.GET.200.json'
screenshot…Can you tell the name of the extension ?
This one runs the real request and saves the response, faking it later by returning what it saved instead of making the request again.
f = () => a+b
refactor for easier testing f = (a, b) => a+b
in your test you can now mock a and bThere are really only a few reasons to use mocks at all. Like avoiding network services, nondeterminism, or performance reasons. If you need to do a lot of mocking in your tests this is a red flag and a sign that you could write your code differently. In this case you could just make the config file location an optional argument and set up one in a temp location in the tests. No mocking required and you're testing the real API of the config file module.
Because you are testing against implementation, not specification.
You’re welcome.
Presumably in the coverage case it’s being called by a trace function, which inevitably runs during test execution — and while we want the trace function to be called during the test function, we really want it without any patches the test function is using. But this arguably requires both an ability for the trace function to opt-out of patches and for the patcher to provide a way to temporarily disable all of them.