David R. MacIver's Blog
Hypothesis for Django
On Tuesday I gave a talk to the London Django Meetup Group, who are a lovely crowd. The theme was (clue is in the title, but really what else would I be speaking about?) “Hypothesis for Django”. Aside from a few lightning talks and one or two really half-arsed barcamp sessions this was my first real public speaking. Given that, if I do say so myself it went unreasonably well.
Anyway, thanks to SkillsMatter who kindly recorded the event, the video is now up. For those who would prefer text (me too), the slides are at https://bit.ly/hypothesis-for-django, and I have written up a transcription for you:
Starting slide
Ok, right. So.
Who am I?
Hi. As per what I said I’m here to talk to you about Hypothesis for
Django.
I am David R. MacIver. The R is a namespacing thing. There are a lot of
David MacIvers. I’m not most of them.
I wrote Hypothesis. And I have no idea what I’m doing.
I don’t actually know Django very well. I write tools for people who
know Django much better than me but they’re the ones writing the Django
applications, it’s usually not me. So if I get anything wrong on the
Dango front, I apologise in advance for that. If I get anything wrong on
the Hypothesis front, I really should know better but I’ve not actually
done a presentation about it before now so please bear with me.
What is Hypothesis?
So what is this Hypothesis thing I’m here to talk to you about?
It’s a testing framework [Ed: I hate that I said this. It’s a library,
not a framework]. It’s based on a Haskell library called Quickcheck...
and you don’t need to run away.
There’s apparently a major problem where people come to the Hypothesis
documentation, and they see the word Haskell, and they just go “Oh god,
this is going to be really complicated, I’m not going to do this right
now”, and they leave. I’ve spent a lot of time making sure Hypothesis is
actually very Pythonic. If you know Haskell, a few bits will look
familiar. If you don’t know Haskell, that’s really fine, you don’t need
to at all. I will never mention the word Monad again after this
point.
And the basic idea of this style of testing is that you write your tests
almost like normal, but instead of you writing the examples the testing
library does that for you. You tell it “I want examples that look
roughly like this”, and it gives you a bunch of examples that look
roughly like that. It then runs your tests against these examples and if
any of them fail it turns them into a smaller example that basically
says “Hey, you’ve got a bug in your code”.
And it integrates well with your existing testing libraries. You don’t
need to use my own custom test runners with this. It works in unittest,
it works in pytest, it works in nose, it should work in anything but
those are the ones I tested it on. And of course, it works well with
Django. It both works with the existing Django unit test runner and
there’s also some specific custom support for it, which is what we will
be using today.
The Setup
So here is what we’re going to be doing. We have some Django project
that we’re testing the backend of and we have two models that we care
about. One of them is User, one of them is Project. User in this case
isn’t actually a standard Django auth user. It could be. It would
perfectly well if it was, I just sort of forgot they existed while I was
writing the example. See “I don’t know Django”. And, basically, Projects
have Users collaborating on them and every Project has a max number of
users it is allowed to have. That would presumably in a real application
be set by billing, but we’re not doing that. We just have a number. And
if you try to add more users to the project than are allowed then you
will get an error.
And what we’re going to do is that we’re going to start from a fairly
normal, boring test using standard Django stuff that you’ve probably
seen a thousand things like it before. And first of all we’re going to
refactor it to use Hypothesis and in the process hopefully the test
should become clearer and more correctly express our intent and once
we’ve done that we’re going to let Hypothesis have some fun and
basically refactor the test to do a lot more and find a bug in the
process.
Code slide 1
Here is our starting point. This obviously in any well tested
application this would be only one test amongst many, but it’s the only
test we’re going to look at today. We want to test that you actually can
add users to a project up to the limit, and this test would currently
past even if we never implemented the limit in the first place, we’re
just saying we can create a project, it has a limit of 3, we add 3
users, alex, kim and pat to it and we assert after that that they’re all
on the project.
Like I say, you’ve seen tests like this a thousand times before, which
makes it easy to sort of fail to notice that it’s actually quite bad.
And the major problem with it is that it has lots of distracting details
that absolutely don’t matter for the test. Basic distracting details: A
project has a name, the users have email addresses, there are exactly 3
users and a collaboration limit of 3, and which of these details
actually matter? It’s completely not obvious from the test. It would be
really surprising if the project name mattered. It probably isn’t the
case that the user emails matter. It might be the case. They’re all from
the same domain for example. Is there some custom domain support? Who
knows? Test doesn’t say. You’d have to look at the code to say. And the
real stinker is the 3. What’s special about 3? Again, probably nothing,
but often like 0, 1 and 2 are special cases so is 3 there because it’s
the first non special number? Who knows? Test doesn’t say.
Code slide 2
So let us throw all of that away. And what we’ve done here is we’ve
taken exactly the same test, we’ve not thrown away the 3 for now, we’ve
thrown everything else away, and we have said “Hypothesis, please give
me some examples”. And what happens here is we accept all of these as
function arguments and the decorator tells Hypothesis how it can provide
these to us. And we’ve told them that our final 3 arguments are Users,
the models function is a thing from Hypothesis that just says “Generate
me an instance of this Django model”. It does automatic introspection on
your models to figure out how to build them, but as you can see from the
Project example you can also override any individual one. Here we’ve got
a collaborator limit set to 3, just is a function that returns a trivial
strategy that always returns the same value. One final thing to note
here is that we had to use our own test runner. That’s due to technical
reasons with transaction management. It works exactly the same as a
normal Django test runner, it just does a little bit more that we need
for these to work.
And what will happen when you try to run this test is pretty much the
same thing that happened when we ran the previous version of the test,
except that it will run it multiple times with different instances
matching this. And unlike, say, a fixture which we could have used for
this, genuinely the details that aren’t present don’t matter, because if
they’re not present then they won’t be satisfied because Hypothesis will
try something else as well.
So this should hopefully be a slightly, once you’re familiar with the
Hypothesis syntax, a slightly clearer version of the original test which
doesn’t have any of those distracting details.
Code slide 3
We will just clean up slightly further en route to making it better yet,
en route to making it better yet and getting rid of that three, and say
that rather than giving each of these a name, given that we don’t
actually care about their names now we’re going to ask for lists. And
the way this works is that we take our models(User) function and say
that we want lists of that. We can specify the min and max size, there
isn’t a precise size function but that’s fine, so in this case the
collaborators function argument is now being passed a list of precisely
3 users. And otherwise this test works the same way as before. We add
each collaborator to the project and then we assert that they are on the
team. And otherwise this is the same as the previous one, and in
particular the 3 is still there. Lets kill the 3.
Code slide 4
What we are doing now is that we have opened up the range of values that
the collaborator limit can take. We’ve told it that its minimum value is
zero, you can’t have fewer than zero collaborators, and its maximum
value is 20. The 20 is still a bit distracting, but it’s needed there
for performance basically. Because otherwise Hypothesis would be trying
to generate really massive lists, and this can work fine. It can
generate really massive lists, but then it will take forever on any
individual test run and then it’s running the tests on, depending on
configuration, possibly 200 times, you’ll probably want to configure it
lower than that, and that will just take ages and wont’ do much useful,
so 20 is a good number. Similarily we’ve capped our lists of users at
length 20 because we don’t want more users than collaborators right
now.
And the only other interesting detail over the previous one is that
we’ve got this assume function call. And what this is saying is that we
need this condition to be satisfied in order for Hypothesis to give us,
in order for this to be a good example. What this test is currently
testing is that when there are fewer collaborators than project limit
and anything else isn’t interesting for this test. And it’s more or less
the same thing as if we just said if this is not true return early, but
the difference is that Hypothesis will try to give you fewer examples
that don’t satisfy this and so that if you accidentally write your test
so that it’s not doing anything useful, Hypothesis will complain at you.
It will say “All of the examples I gave you were bad. What did you want
me to do?”. Again, otherwise this is pretty much the same as before. We
have a project, we have a list of users, we are adding users to the
project and asserting that they’re in afterwards. And the users must be
fewer than the collaborator limit.
And this is pretty much, this is as far as I’m concerned a better
version of the test we started with. It more carefully specifies what
the behaviour that you had, and doesn’t have any of that distracting
detail, and as a nice side benefit when we change the shape of our
models it will just continue working. The test doesn’t really know
anything about how to create a model or anything like that. From that
part, we’re done. This runs fine, it tests
[Audience question is inaudible. From what I recall it was about how
assume worked: Checking that what happens is that the two arguments are
drawn independently and then the assume filters out ones that don’t
match]
Yes. Yes, exactly. It filters them out and it also does a little bit of
work to make sure you get fewer examples like that in future.
And yeah. So, this test runs fine, and everything seems to be working. I
guess we’ve written bug free code. Woo.
Turns out we didn’t write bug free code. So lets see if we can get
Hypothesis to prove that to us. What we’re going to do now is just a
sort of data driven testing where we give Hypothesis free reign and just
see what breaks. We’re going to remove this assume call and this code
should break when we remove this assume call, because we have this
collaborator limit and we’re going to exceed the collaborator limit and
that should give us an exception.
Code slide 5
So this is the change, all we’ve done is remove the assume.
Code slide 6
And we get an exception! And Hypothesis tells us the example, it says “I
created a project with a collaborator limit of 0, I tried to add a user
to it, I got an exception”. That’s what’s supposed to happen,
excellent!
Code slide 7
So lets change the test. Now what we do when we are adding the user is
we check that if the project is at the collaborator limit something
different should happen. We should fail to add the user and then the
user should not be on the project and otherwise we should add the user
and the user should be on the project. We’ve also inlined the assert
true next to the adding because this way we can do each branch
separately, but that shouldn’t change the logic.
Code slide 8
Now we run this again and Hypothesis tells us that our test is still
causing an error. And what’s happened here is that Hypothesis has tried
to add the same user twice, and afterwards it’s saying... and even
though we’re at the collaborator limit, afterwards it’s saying the user
is still on the project. Well, OK, so the users should still be on the
project because the user started on the project.
Code slide 9
So lets just exclude that option from that branch and see what happens
now.
In the first branch all we’re doing is adding an extra condition that we
don’t care about that example, pass it through to the next bit.
Code slide 10
Still failing. Same example in fact. Hypothesis will have remembered
this example and just tried it again immediately. [Ed: This isn’t
actually the case. I didn’t notice at the time but the email addresses
are different. I think the way I was running examples for the talk made
it so that they weren’t shared because they were saved under different
keys]. And what’s happening here is that we’re adding a user to a
project of limit 1, and then we’re adding them again. And it’s still
raising that limit reached exception, and we’re not really sure what’s
going on here. And the problem is that at this point Hypothesis is
basically forcing us to be consistent and saying “What do you actually
want to happen when I add the same user twice?”.
Code slide 11
So lets look at the code now.
The code is very simple. If the project is at the collaboration limit,
raise a limit reached, otherwise just add the user to the project. And
looking at this, this is inconsistent. Because what will happen is that
if you are not at the collaboration limit this will work fine. Adding
the user to the project will be a no op because that’s how many to many
relationships work in Django. But if you are at the collaboration limit,
even though the operation would have done nothing you still get the
limit reached error. And basically we need to take a stance here and say
either this should always be an error or this should never be an error
because anything else is just silly.
Code slide 12
We arbitrarily pick that this should never be an error. It should behave
like a no-op in all circumstances.
Code slide 13
And we re-run the test and this time it passes.
It passes in a slightly long period of time because it’s running quite a
lot of examples. Often what you do is turn the number of examples down
in development mode and then run this more seriously in the long
term.
And that is pretty much it for Hypothesis.
Obligatory plug
I have an obligatory plug, which is that I do offer training and
consulting around this library. You don’t need it to get started, you
should try and get started before you pay me, but then you should pay me
if you really want to. I am also potentially available for other
contracting work if Hypothesis doesn’t sound that exciting to you.
Details
And here are my details at the top. There is the Hypothesis
documentation. And there are the links to these slides, available
permanently on the internet for you to reread at your leisure.
Thank you very much. Any questions?
Comments
alfred werner on 2015-06-13 00:20:39:
Nice tool, nice talk. One small point, if you are in front of a group and answer a question, you should repeat the question (summarized) before answering so others in the audience and those watching the video will know what you’re answering.
david on 2015-06-15 15:09:12:
Thanks! Yeah, I realised afterwards that I’d forgotten to do the repetition thing and kicked myself for forgetting.
Using Hypothesis with Factory Boy | David R. MacIver on 2015-06-17 10:07:12:
[…] gave a talk on the Hypothesis Django Integration last night (video and transcript here). I got some questions asking about integration with Factory […]
Notes on Hypothesis performance tuning | David R. MacIver on 2015-07-17 08:41:02:
[…] concrete example being the one I cover in my Hypothesis for Django talk where the end result is probably a bit more complicated than what you would normally naturally […]