
Elm Town 52 – It's not like I have a stoplight on my desk
Aaron VonderHaar returns to the show for a deep dive on automated tests, test-driven development, and elm-program-test, a new high-level test framework for Elm.
Elm Town · Kevin Yank
Audio is streamed directly from the publisher (cdn.simplecast.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Aaron VonderHaar returns to the show for a deep dive on automated tests, test-driven development, and elm-program-test, a new high-level test framework for Elm.
Thank you to our sponsor, Culture Amp.
Special thanks to Xavier Ho (@Xavier_Ho) for editing and production of this episode!
Recording date: 2 April 2020
Guest
- Aaron VonderHaar (@avh4)
Show Notes
00:00:00 Intro
00:01:42 TDD and automated tests at NoRedInk
00:02:19 elm-program-test
- elm-program-test
- elm-test
- Capybara
- Selenium
- Test.Html.Query (was elm-html-test)
- Test.Html.Event (was elm-html-test)
00:04:36 Why write automated tests
00:06:44 Test-driven development
00:08:55 Tests vs types
00:11:33 Test-driven development (continued)
00:13:25 Red, green, refactor
00:16:18 Test-driven development (continued)
00:20:23 Testing at the right level
00:24:53 Testing culture in a team
00:26:43 The need for elm-program-test
00:30:22 The elm-program-test API
00:32:12 Standing in for the Elm runtime
00:35:34 Testing Elm commands
00:37:49 Standing in for the Elm runtime (continued)
00:39:46 Resolving and asserting on HTTP requests
00:43:08 Other supported effects
00:45:12 Modelling the user interface in your test suite
00:47:18 Smart DOM matchers
00:49:05 Keyboard focus tracking
00:49:48 elm-program-test vs non-Elm alternatives
00:51:53 Stability and feature-completeness
00:53:00 elm-program-test at NoRedInk
00:54:42 Testing the interface with the back end
00:55:58 Related talks by NoRedInk colleagues
00:56:53 Sign-off and outro
Transcript
[00:00:00] Kevin Yank: Hello and welcome back to Elm town. It's your old friend Kevin, and I am rejoined by Aaron VonderHaar back for his second episode. Welcome back, Aaron.
[00:00:09] Aaron VonderHaar: Hey, Kevin, it's great to be back.
[00:00:10] Kevin Yank: Aaron, you were with us a few weeks back now to talk about your work on elm-format, among other things, including a new refactoring tool that, we'll be releasing very soon, I imagine, by the time our listeners hear this. If you haven't caught that episode,
[00:00:28] go back two episodes and have a listen to that first chat we had with Aaron, because we will be picking up on the threads of that conversation here today with a focus on elm-program-test, and testing in general for Elm. Richard Feldman is not the only person that, NoRedInk who cares about testing.
[00:00:48] Am I to understand that right, Aaron?
[00:00:50] Aaron VonderHaar: That's right. Test driven, and having a lot of automated tests has been pretty core and NoRedInk, for everybody there. So, yeah, we all get into it now.
[00:00:58] Kevin Yank: I'm actually interested in hearing about the background of that, cause I would say that's maybe not true of every company that's using Elm today. But I'm aware that NoRedInk started as a Ruby on Rails application, where in that ecosystem, in the Ruby ecosystem, test driven development and automated testing in general is very strongly embedded in the culture of that programming community.
[00:01:23] Is there a sense that NoRedInk's investment in testing and belief in testing flowed out of that background in Rails or, not?
[00:01:30] Aaron VonderHaar: Yeah, I'd say that is definitely fair to say. RSpec is the big testing framework there, and that was definitely encouraged by Rails, the framework as well.
[00:01:40] Kevin Yank: And so I guess a lot of developers who are used to having a strong testing framework on the back end, they are invested in the idea of having that on the front end with Elm. And that leads to people like Evan (sic.) working on elm-test and yourself working on this new package, elm-program-test.
[00:01:59] Aaron VonderHaar: I mean, web apps in general these days, testing is highly thought of across a lot of JavaScript platforms as well. So it's not unique to Elm, but Elm kind of being a totally separate language, even though it works in the JavaScript ecosystem, kind of needs its own tools and its own way of doing things that supports the language itself.
[00:02:19] In a nutshell, elm-program-test is an Elm package that you can use alongside elm-test, and it lets you write tests that are maybe at a higher level of testing than what you can do with elm-test out of the box. So the types of tests you can write in elm-program-test are similar to maybe what you might write in tools like Capybara in Rails or Selenium where you write tests saying, okay, the user is loading this page,
[00:02:48] the user is going to click this button, fill in text in the input with this label, navigate to this other page, submit the form, and then check that certain things appear on the page afterwards. So writing tests at that high level of the user perspective, is what program-test is designed for. And I like to think I've made a pretty nice and easy-to-use API to let you do that in a powerful way.
[00:03:10] Kevin Yank: So if elm-test by itself is for unit testing, elm-program-test is for the other kinds of tests you're going to write.
[00:03:20] Aaron VonderHaar: Yeah. So certainly if you have a function or a module with pure functions in it, using the built in stuff in elm-test is the place to start. After a while, Noah Hall developed the elm-html-test, which is now incorporated into the elm-test library itself, that lets you check things on the HTML values that get produced by your views. elm-program-test goes a step further and actually lets you incorporate your init function, your update function and your view function and all the messages and everything into a single unit that you can just run user steps on and inspect the output of the view and kind of interact with your program and check what the user's going to see afterwards.
[00:04:03] Kevin Yank: So tests at the program level. Thus the name.
[00:04:05] Aaron VonderHaar: It took a while actually to get to that name, but I think it reflects the purpose now.
[00:04:10] Kevin Yank: I can imagine, cause as you were describing the type of tests it's for, those tests I've heard called acceptance tests and feature tests and end-to-end tests. And I imagine all of those terms, were on the table to name this framework. And in the end, you landed on, “Well, actually, what do we call the thing we're testing here in Elm?
[00:04:29] “We call it a program.” So yeah, it makes sense once I understand what it does.
[00:04:34] Aaron VonderHaar: Yup. Exactly.
[00:04:36] Kevin Yank: Hmm. Well, let's take a step back, here. We're going to get to the details of elm-program-test by the end, here, but I'd like to come at this, from kind of a first principles, “What are you trying to achieve and why?”
[00:04:48] question here, cause I dare say there are a lot of people who use Elm, who are listening to this, who use it as a hobbyist or, as a first language of this type, are exploring the space and are maybe, maybe not necessarily feeling the need for a test framework as the first thing they want to invest their energy in as they're exploring Elm.
[00:05:11] So tell me a little bit about your philosophy around testing. When should you start testing a piece of Elm code what should that look like?
[00:05:20] Aaron VonderHaar: If you're working on a production system where you have a job that depends on code that you're writing working with other code, and many people are touching it, and it's a system that's used by real people in that environment, there's certainly value in testing in that it protects you from bugs, it protects you from future regressions, that sort of thing.
[00:05:41] But there's another aspect to testing, which, I think last time we talked about some of my background, previously working at Pivotal Labs doing agile consulting, and the way I learned to use testing in development then, and specifically the test driven development practice, is an approach to kind of integrating writing tests and thinking about tests
[00:06:05] into the way that you actually write the code. And you can do that through test driven development in a way that helps you think about the design process for designing the code, both the architecture of your code and the implementation. So I'd say if you're working on a hobby project that maybe doesn't have the correctness requirements of a paid job, it's certainly up to the individual.
[00:06:30] But personally, I find a lot of benefit in the process of using tests to think about and educate and even help me find correct or even better solutions faster. So I kind of use it as a thinking tool as well.
[00:06:44] Kevin Yank: Right. So for someone who hasn't practiced TDD, or maybe is hearing the term for the first time, how would you describe that process and the experience of using it as a thinking tool, as you put it?
[00:06:56] Aaron VonderHaar: When you get into kind of the details of test driven development, one of the big things is to think about the context of the caller, the person calling your code if you're writing an API, or the user that's using your site if you are building an application. So the way it tends to play out for me is I'll have some feature that I want to implement.
[00:07:19] Rather than just looking at it as a specific list of things to implement and make work, I want to look at that feature and think, okay, who is going to be using this feature? In the case of NoRedInk, we have a lot of teachers and students, so we might have. Like a student leader board for a class. We think about who is a teacher, who's going to be using this feature, what's going through their mind when they come to interact with it?
[00:07:44] And then from that you kind of walk through a scenario of a real teacher in your mind, actually using the feature and try to write tests from that. Kind of the process of TDD is you think about, okay, if you tried to go through this scenario as a user, what's the first thing that the site currently doesn't do that you would run into in trying to go through the workflow and that's your first test.
[00:08:10] You write a test saying, okay, as a teacher want to see the leaderboards for my students. That's your first test, and you'd say, okay, what page is that going to be on? The body of the test would be load the page where it's supposed to be, stub out some data about the class and maybe the grades in the class, and then check that certain things appear on the leaderboard, which are the things that the teacher would be looking for,
[00:08:35] like maybe the name of the first student and the number of points that student has.
[00:08:39] Kevin Yank: Assuming that page doesn't exist yet, that test will immediately fail.
[00:08:43] Aaron VonderHaar: Yup, exactly. And in the case of a strictly type language like Elm, it may not even compile yet. So, you can often think of compiler errors as a sort of test failure in a way.
[00:08:55] Kevin Yank: That does get at something, which I've heard said and said myself a number of times over the years, which is having a strongly typed language, like Elm means you have to write fewer tests. Is that something you believe?
[00:09:09] Aaron VonderHaar: I would say that is sort of true. Yeah, I definitely know that argument of types versus tests. there was an interesting way I saw of thinking about this, where in a functional language functions are types of values in a sense, like you can have a variable that contains a function. So when you're writing tests, you can think of that as reducing the number of possible functions that exist that could be implementing the type annotation that you've defined for that function.
[00:09:42] So in that sense, there's some similarities between tests and types in that by changing the type annotation, you could restrict what kind of inputs the function can take or what kind of outputs it can produce, that reduces the number of possible implementations you could have for that function, which hopefully reduces the number of possible incorrect implementations you could have.
[00:10:04] And the same is true for tests. By adding a test, you are in a different way and using different tools, reducing the number of possible valid implementations, for whatever the function is.
[00:10:15] Kevin Yank: All right. And so if your goal is to, first narrow the possibility space of what this code might do, if you're using a type to do that, maybe that's one less test that you have to write. But speaking for myself, those would be the, probably the lowest value tests I would be writing in another language.
[00:10:34] And the higher value tests are still worth writing.
[00:10:36] Aaron VonderHaar: Yes, exactly. So like an example is in JavaScript, let's say, you might want to write a test saying that when given these inputs, the particular function does not return null. It always returns some default value or something like that, in case of an error. In Elm, that's an example of something where the type can handle that.
[00:10:58] You can enforce in the types that whatever the function will never return null. But also on the other side of the coin, in JavaScript, you aren't always writing a test that your function does not return null. For every function, certain functions, it makes sense because you know it's a risk depending on the inputs, but other functions, you know, okay, it's never going to come up,
[00:11:19] or if it does come up, it's obvious what's wrong, in which case you don't necessarily need to write those tests. But certainly in Elm, there is a lot of safety that you can get basically for free or just for thinking about the types. Whereas in JavaScript, you don't have that protection.
[00:11:33] Kevin Yank: So back to this TDD scenario, we've created a test to where the teacher wants to go to the page and see the scoreboard and, the compiler fails, or if you happen to luck out and the compiler passes, the test fails because the page doesn't exist. What next?
[00:11:50] Aaron VonderHaar: I tend to think about different levels. So the test we just talked about is a fairly high level test. Maybe you heard the words high level when I talked about elm-program-test.
[00:12:00] Kevin Yank: Yeah. I was going to say, this sounds like the kind of test you would write without elm-program-test.
[00:12:04] Aaron VonderHaar: Right, exactly. And that's in fact kind of why I wrote elm-program-test to fill that gap. But even if you aren't writing that test, you can think about that as a test case anyway and say, okay, I'm going to start my local development server. I'm going to login.
[00:12:19] I'm going to try to go to that page. You see it fail. In a sense, you can think of that as a manual test case. In either case, the next step would be, okay, why doesn't it fail? We've already diagnosed that is because the page doesn't exist. Okay, we'll go create the page. If there's any logic related to the creation of the page that's non-trivial,
[00:12:39] we might want to write a test for a lower level detail. For instance, maybe we need to decode some page flags, for loading the page or something like that. This is getting a little bit contrived as a first page for a new test, but I think about it in layers where I write this kind of high level test describing the user behavior,
[00:12:59] then I see what's failing. I think about why it's failing, and then I figure out, okay, what code needs to exist for that first step to be able to move on. And sometimes at that point I go down and write a lower level or a more unit test on some specific module maybe to implement some function that I'm going to need to wire up the test at the higher level.
[00:13:24] Kevin Yank: Often when talking about TDD, we talk about the red, green, refactor cycle. Is that something that is still a real part of your workflow today or is that more like something you learn on your first day of TDD, but you kinda let go of the details of that process over time?
[00:13:40] Aaron VonderHaar: It's definitely something I keep in mind, and I feel like I follow that pretty consistently, but also it's not like I have a stoplight on my desk, like, to remind me—
[00:13:51] Kevin Yank: What mode am I in? Yeah.
[00:13:53] Aaron VonderHaar: Right. Yeah, I think it's something I tend to maybe reflect back on if I am in a position where I'm getting stuck and I can think, okay,
[00:14:04] let me run through kind of my checklist of what I need to do. Do I understand what behavior I'm trying to implement right now? Am I trying to take too big of a bite right now? Am I getting lost in the weeds? Is there some simpler version of the problem that I could focus on making work right now?
[00:14:23] An example that comes up for me a lot is, okay, maybe I'll have a failing test and I know that to make that test pass, it's going to require both maybe writing some information into the database and then reading it out later. I guess this would be in the case of a Ruby test where we're on the back end. But often you'll have a test failure where to make it work, you have to do something to both change what's getting stored internally and something that's being used, reading that internal state to produce some outcome. And often what I'll do in that case is I will make the test pass by just hard-coding the output value somewhere in the code, which the way that helps is that it helps me find the exact location in the code where I'm going to need to have some logic to decide what to output.
[00:15:14] And then often that'll make it obvious to me what specific value I need stored in the internal state that I need to be able to use to make that determination. So it kind of breaks the overall problem into. The output and the input step and you can tackle those separately.
[00:15:30] Kevin Yank: For those who may not have heard of red, green refactor, the idea is you first write a failing test and your tests are then red because they fail. You then write the necessary implementation to make that test pass, and the tests turn green, and then having that passing test suite, you are then allowed to refactor your code to try to make the structure of it a better expression or a more maintainable version of the set of features that your tests guarantee will exist.
[00:16:01] And as long as your test say green, you know, you haven't broken anything, and when you're happy with the shape of your code, it is time to write another failing test before you make your code do anything new. And so you go through that red, green refactor cycle all the way to success. Now, in your example of the scoreboard for the teacher, you mentioned hard-coding
[00:16:23] a value. And the thing that came to mind was, okay, as we are writing this scoreboard screen, we might start with the empty state. So there are no students have taken this test. So there are no scores yet. And so the first test I might write is I go to the scoreboard screen and it says “no scores”.
[00:16:44] And the easiest way to make that test that is currently failing pass is to hard-code a view that outputs the HTML “There are no scores” – at all times – and your test turns green. And the idea of red, green, refactor is before you then go on to add more features to this page, you need to add another test that says, oh, and if there happens to be some data in the database, then I get a different result on the page.
[00:17:13] And you add that feature while ensuring that the old empty state continues to work.
[00:17:19] Aaron VonderHaar: Yup. Exactly. And you were asking about how strictly I follow that. An interesting optimization for the example you just laid out is something that actually I learned to do from a product manager. But the idea is that, okay, the empty state seems like a natural starting case, but if you try to think about, okay, can I make my first test be a test
[00:17:44] that is the most common test that's going to happen in real life. So in this case, okay, most of the time teachers are going to have students. So if you make your first test be having actual data, it can kind of be a shortcut in a sense because it forces you to drive out more of the implementation upfront,
[00:18:02] and also gives you some flexibility from the product side to maybe descope the fine tuning of the empty state later, if you want to get a feature released sooner.
[00:18:11] Kevin Yank: One of my pet peeves is opening a test file and having to scroll past two pages of really artificial, edge-casey tests before I get to the meat of what does this thing actually do,
[00:18:23] so yeah, I like that approach. When I find myself leveraging the rigorous version of TDD is when I'm writing a piece of code that I don't necessarily fully understand yet, or I can't quite hold it in my head.
[00:18:42] Richard Feldman has done a number of talks about how, you want to get your data representation right: he's given a talk about, you know, focus on designing data structures; he's given another talk on making impossible states impossible. And all of those things come back to getting the data structure right and everything flows out of there.
[00:19:03] But every now and then I am solving a problem where I go, wow, I honestly don't know what the right data structure is going to be for this. There's just one too many features to support with a single Elm type definition, and how am I going to implement those constraints? How many of them will be provided by the type system?
[00:19:23] How many of them will be enforced by the functions that act on that opaque type? And if I don't know, that's when I break out TDD and I start going, all right. One requirement at a time, here. Start with what is the most important thing this has to do. Get a green test that guarantees it will always do that, and then start thinking through edge cases,
[00:19:43] start adding constraints one at a time. And invariably my implementation code starts to get ugly at that point, and forcing myself to wait until my test suite is green before I start messing with alternative approaches is really valuable. It kind of forces me to slow down, to prove to myself that an idea I'm experimenting with, actually satisfies all of the constraints that I can't hold in my head all at once yet.
[00:20:12] Aaron VonderHaar: Yeah, that's a great example. And a situation that is easy to run into when you aren't quite sure what data structure or what algorithm you should be using, internally, a lot of times it's easy to do what I was describing earlier and jumped down to testing at the lower level too soon.
[00:20:30] And you might start writing unit tests that are very specific to the data structure that you chose, in which case, if you later decide you want to totally change that data structure, you now have a whole bunch of tests that were tied to that old data structure that you now have to throw out or rewrite in terms of the new one.
[00:20:50] So that's an example where if you have that uncertainty, it's good to be able to identify that and have your testing at a little bit higher of a level. Think about kind of the high level API that you're trying to implement and what behaviors and outcomes are being observed on the system as a whole.
[00:21:08] So if you write your tests at that level, it gives you the flexibility to entirely change how it's all implemented, but still have those tests providing coverage without any risks the tests will be invalid.
[00:21:20] Kevin Yank: So what I'm hearing is want to work from the outside in. Before you start defining the criteria for a tiny module in the middle of your program, you want to start by justifying its existence at the outer levels. And so everything starts by an elm-program-test test that says, this feature should do this thing for this user under these conditions.
[00:21:43] You work your way in to the point where you are writing a single module in support of that.
[00:21:49] Aaron VonderHaar: Yeah. Be able to think about and identify the different levels that you could test at, and make a choice that's the correct one for whatever situation you're in, about how many tests at which level you want to do. We actually have some variants at NoRedInk. We have at the moment, two different teams that are focused on building user facing features, and do most of the Elm work at the moment.
[00:22:14] I'm on one of those, and my team tends to, on the one hand, have less involves features. Like we do a lot of user interface work for the teachers and changing the overall not very deep features, but features across the site. Let me give an example. So like, our team is responsible for like the dashboard that shows all the assignments or the grade book that shows a summary of all the grades, which can have a lot of Elm code, but it's not a particularly algorithm-heavy code.
[00:22:48] Whereas the other feature team has been working on, like our new self-reviews feature, which is this entire interactive assignment with multiple steps. And you kind of worked through an essay section by section where there's a little quiz and you can highlight stuff. So there's kind of a state machine involved on the back end.
[00:23:07] We're using a rich text editor that we had to integrate with Elm, be able to merge updates if there's conflicting changes in multiple browser tabs, things like that. So that's kind of like a deeper feature, and that team ends up doing a lot more of the more focused unit tests compared to my team where we have a lot more features that are—
[00:23:27] We only need to go as deep as kind of thinking about the user experience because the code to wire it up isn't that complex from an algorithmic sense.
[00:23:36] Kevin Yank: You were saying that informs the testing practice of your respective teams.
[00:23:41] Aaron VonderHaar: Yeah I like to always start by thinking about what's the highest level test, but there are tradeoffs on the spectrum too. Like if you test too high, one thing is your test can be slow if, for instance, you're using Capybara or Selenium or something like that.
[00:23:56] Luckily on the Elm side, even elm-program-test is very fast even for high level tests. But the other downside of high level tests is it can be harder to debug what is exactly wrong when a test fails. You might get a failure like, “Oh, the, the button with the submit text doesn't exist on the page.”
[00:24:15] But it doesn't tell you why it doesn't exist. Whereas having some lower level unit tests gives you some brittleness in that a lower level interface is now protected by tests, and if you need to change the API of that module, your tests will break or have to be rewritten, but it gives you the benefit that if one of those lower level tests fails, it's going to be a much clearer idea of like where exactly the problem is in your code.
[00:24:43] So there's kind of a balance you need to, you need to make between how many of each kind of test are appropriate for whatever your situation is and the experience level of the people on your team, and so forth.
[00:24:53] Kevin Yank: How do you and your team think of and talk about that balance? Are there established norms? If you're reviewing a code change that deviates from those norms, it will raise your eyebrows and cause you to question it. Or is there any tooling around things like code coverage and things like that?
[00:25:11] Aaron VonderHaar: Yeah, that's an interesting question. I believe there's a one or two Elm code coverage tools, but I haven't personally played with them, and we don't use them consistently at NoRedInk
[00:25:22] Kevin Yank: The thinking behind the question is, for me, this is one area where front end tests and back in tests differ significantly. Maybe it comes back again to the difference between a dynamic language like Ruby or a statically type language
[00:25:37] like Elm. But on the back end, it is way more common in my experience to have the, that tooling to go the code you are contributing must have a minimum level of test coverage before it is considered acceptable. Whereas on the front end. At least at Culture Amp, we kind of trust our engineers to think about and make good decisions about that and we trust them to hold each other accountable to that in code reviews.
[00:26:05] But it is a much more informal process that is difficult to set clear expectations around in a growing team.
[00:26:11] Aaron VonderHaar: Yes. Yeah. NoRedInk is definitely on the informal side there. Personally, my recommendation is just that aligning a team on that is pretty much like any other team process, it's going to require good communication, as much communication as possible, typically more than you think is going to be necessary.
[00:26:33] And ideally pair programming is a great way to really get people to talk about differences of opinion that they didn't realize they had until they started working together.
[00:26:43] Kevin Yank: Let's start to talk about elm-program-test a bit. I said at the beginning that this was new to me, but obviously this, this framework has been around a little while to, to get to a third major version, unless, you know, that is just semantic versioning at play and you had to put a couple of breaking changes in there, early in its life.
[00:27:02] But, was there a time when NoRedInk was shipping Elm code without the ability to write program level tests in Elm?
[00:27:12] Aaron VonderHaar: Even today, we still have a fair amount of our front end code does not have tests at that level. html-test itself was kind of an early attempt at answering the question of what kind of tests are appropriate and necessary to write for front end code in Elm. That turned out, in my opinion, to not be particularly useful for applications because you ended up unit testing your view function, which
[00:27:40] by itself, the view function is very declarative. You tend to not have a lot of complicated logic in your view function. So where Elm programs get interesting is the interaction between the messages that can get produced by the view or through other means and how the update function processes those, and in particular when you start having a sequence of interactions, how the state of the program evolves over time, as a sequence of messages gets played back.
[00:28:10] So being able to kind of test the view function and the update function together as a unit, and have a single test. So in our case, the origins of elm-program-test came when we were redesigning the assignment form at NoRedInk where teachers can create assignments and we have seven different kinds of assignments, all with different kinds of contents.
[00:28:32] So the page itself was pretty complicated. And having only unit tests kind of available in elm-test at the time, you tend to get very, a very confusing set of tests when you have a test that reads like, okay, given that the teacher has already selected these 10 options in this way, then we're rendering a view that has a button that when clicked, produces this message, and then you have a separate test saying that, okay, given that you have this initial state, when this message is processed, here's the next state.
[00:29:06] And it's just very hard to read and hard to understand and just grows very, very fast to write tests that way when you have a complex page.
[00:29:14] Kevin Yank: You end up with a bunch of isolated tests about a single transition from one state to the next, and the big picture of what is the system supposed to do gets lost. And bugs can fall into those cracks.
[00:29:26] Aaron VonderHaar: Yup. Exactly.
[00:29:28] There was kind of a choice of like, okay, do we just reduce and slow down on the amount of tests we're writing and kind of step away from having as much coverage of this page? Or do we continue to write tests that way? And to me there was the other solution of how can we write a testing API that makes tests that we want to write and that are easy to write and read easier to produce? And actually my background at Pivotal Labs, I've worked on a bunch of test framework API in the past. There's the Robolectric framework for Android, and we did an experimental one while I was there for iOS apps. That was kind of a similar type of high level API where you could specify interactions with the user interface and see what kind of effects it produced.
[00:30:15] So having that background, it was natural to me of like, oh, we just need to start writing a new API to, for this.
[00:30:22] Kevin Yank: So let's talk about that API. How do elm-program-test test suites look different from the test suites that people might already be writing with elm-test.
[00:30:31] Aaron VonderHaar: Typically in elm-program-test, you are going to typically define a top level definition called start, I like to call it, which basically sets up your program. It needs to specify the init function, your update function, your view function – and we'll talk about, commands and how to test commands and subscriptions in a minute –
[00:30:52] but you set that all up in the first place and that produces a value of the type called ProgramTest, which basically represents your program and also its current state in the test world. And it also tracks other things like, any pending HTTP requests or other effects that are being simulated, by elm-program-test. So your typical test case is gonna say start, and then the next line you'll pipe it to some test program command, typically clickButton or fillIn and you'll say like, oh, clickButton "next", or fillIn and you'll give the label of the input field to fill in and the value to put in. And then there's another set of functions, that all start with the word expect, or there's an ensure version if you need to do multiples.
[00:31:42] So you can say, ensure page has, and then give some html-test selectors, like ensure that the page has a certain piece of text on it or ensure that there's a heading with certain text in it or ensure that there's a button with this class on it, and so forth. So you basically just string a bunch of those together,
[00:32:00] and if you want, you can alternate – click some things, check things on the page, click other things – and kind of go through an entire workflow as much as you feel is appropriate for whatever particular test you're trying to set up.
[00:32:11] Kevin Yank: It sounds to me like there would have been a lot of work in modeling the outer layer of that system that you are testing there. like most Elm code that we write, we get to assume there is an Elm runtime there doing all of the things that we ask it to do by returning it commands, and we also assume there is a browser there that is, you know, maintaining the DOM and, sending us messages when the user interacts with it.
[00:32:40] And in a way to write the kind of tests you write, your testing framework kind of needs to step outside of that system and poke at it from the outside. So does that mean you have had to basically emulate the Elm runtime itself and, at least a simplified version of a web browser?
[00:33:00] Aaron VonderHaar: Yes, to some degree. And a big answer to that is something I learned from working on the Android test framework I mentioned, Robolectric, is you can kind of be as a TDD approach here to where your test framework only needs to implement the features that people actually need for their tests that exist so far.
[00:33:20] So there's a lot of, for instance, web APIs that are available in Elm that elm-program-test doesn't support yet, but it's kind of set up in a way that that can be added, hopefully if people like this and have unusual, effects that they need to simulate and so forth. Hopefully people will start contributing some PRs to add and flush out the API.
[00:33:42] But it's definitely been an incremental approach in developing it. And I think towards the end of last year, I decided, okay, I'm going to make an effort to add good documentation to that, clean up the API, and it was actually version three that was the first that I started publicizing it.
[00:33:57] Kevin Yank: I'm trying to get my head around some aspect of it that I think you're going to clarify when you explain how you test commands and update functions and subscriptions. But the thing that's going through my head is, if my program inside of an elm-program-test suite makes an HTTP request, does that HTTP request actually occur by default? Are those things, mocked or stubbed by default or by exception?
[00:34:23] Aaron VonderHaar: Oh yeah, that's an interesting question. It comes down to a subtlety in Elm where when you use the HTTP library in Elm and you call get or post or whatever, you get back a command. What's interesting to note is that the commands in Elm, when you call a function that returns a command, it hasn't actually done anything yet.
[00:34:48] The command basically represents a piece of data that when the Elm runtime receives that command from your update function, it interprets that and does the actual effect. So, stemming from that, when you run elm-test, there's basically no mechanism for commands to get to the Elm runtime. Kind of, the test runner is hiding all of that.
[00:35:09] So to answer that question, any commands you produce, HTTP requests, when you run them in tests, there's in fact no way for them to actually get executed and actually performed as real HTTP requests.
[00:35:23] Kevin Yank: Okay, so elm-program-test isn't breaking any new rules. There are no runtime environment exceptions specifically for this test framework is what I'm hearing.
[00:35:34] Aaron VonderHaar: So being able to test commands, program-test provides a way to do it, but it is certainly not the ideal. So the way it works now, and actually, on the elm-program-test documentation, I kind of wrote a guidebook of walking through step by step how to do what I'm about to describe.
[00:35:51] But basically at the moment, because Elm has command as an opaque type and there's really no way in Elm code to be able to take a command and look at it and figure out what it represents – only the Elm runtime can do that. So out of necessity at the moment, the way elm-program-test works is that if you're going to use it to simulate commands or subscriptions, you basically have to refactor your program to define a new data type that you define, that lists out all the possible, commands that your program can produce.
[00:36:31] I tend to call that type Effect and it's just a union type that you would define. And then you have a separate function that takes values of that new type and turns them into the real commands. And then on the testing side, you have to provide a separate function that basically looks the same, that takes your new type and turns it into the simulated commands that elm-program-test can read.
[00:36:57] So that's the way it works now, but I believe Richard actually talked about this in the episode he was in, about looking at ways for elm-test itself to provide some JavaScript code that would be able to eliminate that step and allow elm-test to be able to inspect commands,
[00:37:15] which would support features of elm-program-test and be able to do everything you can do now just with a bit less boilerplate and without having to refactor your current program before you can test commands and so forth.
[00:37:27] Kevin Yank: For listeners who want to go back and hear the last time Richard was on talking about elm-test, that's Elm Town episode 46, in September 2019. He was on to talk about the release of Elm 0.19.1, and we talked a little bit about what was coming in elm-test and it definitely is forming a big picture now, seeing elm-program-test landing.
[00:37:49] Aaron VonderHaar: You asked earlier about the complexity of simulating the Elm runtime. With the current solution, it's relatively straightforward. Basically, I have, a union type defined internally in elm-program-test that defines all the possible commands that can be simulated at the moment, which is a growing number, but basically HTTP, the sleep timer and a couple of others. I guess you can deal with tasks and things like that.
[00:38:17] But actually previously, another precursor to elm-program-test, I was separately looking into how to test commands and have this thing in called elm-testable, where I was actually trying to write native code that could inspect the internals of commands and figure out what they were and destructure them.
[00:38:36] And in that version of testing, I actually went through the work of writing Elm code that simulated the effect managers that Elm has internally and, like, queuing up all of the events that would happen in the event manager queue, and processing those. So that was actually a lot of complexity, but ended up not really being needed
[00:38:59] with this new approach, as long as you provide a way to tell elm-program-test what commands you're trying to produce. It's relatively straight forward to implement that internally now.
[00:39:10] Kevin Yank: I like that pattern of kind of replacing all of your commands with your own custom inspectable commands that get converted to real Elm runtime commands kind of at the outer layer of your program. I'm understanding that you are able to make these kind of transparent-to-you versions of Elm commands, and that leads you then to a process of stubbing out the responses to HTTP requests, for example. What does that end up looking like in practice? What are the kind of features that you need to provide test versions of to a typical Elm program?
[00:39:46] Aaron VonderHaar: So inside of the Elm program, state or like the internal state of a running program, there is a couple of things that are tracked now. So one is kind of a list of, or a dictionary of all of the outstanding HTTP requests. So whenever your update function produces an effect that maps to an HTTP command, all elm-program-test does at that moment is track it as a new pending request.
[00:40:15] What that means is then after you say, click a button that results in your update function, posting an HTTP to an HTTP endpoint, then after that, at any point – and you don't have to resolve it right away – but at any point after that, in your test, you can either assert that a request was made to a particular endpoint. If you want to, you can assert on things in that request. If you want to assert on the body of the request or the headers, something like that. And there's also a mechanism for stubbing a response. So you can either say that the request errored with some particular HTTP error, you can provide an okay value, with the status code, with the response body,
[00:40:58] and basically at any moment after that. So if you have multiple requests in flight, you could have a test that resolves them in different orders, if that's the case, if you want to test for and so on.
[00:41:07] Kevin Yank: Do you need to explicitly handle requests after they have been queued or is there the ability to, for example, set up some request handlers that say, if I get a request that goes to this URL, it will receive that response. I don't care in any given test if that has actually occurred or not,
[00:41:27] I care about the effect of that response having come back that the user can observe. That's a way I've seen these kinds of tests written in the past is you set up the system first and then you knock it down by playing at the user's interactions and expectations.
[00:41:41] Aaron VonderHaar: Yeah, that's a good question. That was actually the alternate API that was considered when I was first looking at what kind of API to design to expose this. In looking into it, I actually determined that that type of API, where you define your handlers upfront, can actually be built on top of the API that currently exists, where you let things get queued up and then you can resolve them later.
[00:42:08] So the API you're describing doesn't exist right now. It could. There's some interest in it. I think I have probably a GitHub issue, tracking that. So that's something I'd like to build in the future. But, it does not currently exist, but totally could and could be built in a particular test helper, for an individual project if you needed.
[00:42:26] Kevin Yank: Yeah. If I really wanted that, I could model a set of request handlers and then at a point in my test suite, say resolve all requests or something like that.
[00:42:36] Aaron VonderHaar: The tricky part in making an API for it tends to be that it needs to be flexible enough to allow, for instance, dynamic responses depending on content in the request, but not everyone needs that level of complexity. So how do you make an API that's simple for regular usage, but also is flexible enough?
[00:42:56] So that's kind of the API design issue there in making a general solution. But yeah, certainly I'd encourage anyone that wants that, it could be done, as you just described, using helper functions within your project.
[00:43:08] Kevin Yank: I'm sort of mentally going down my internal list of effects that I know that the Elm runtime can handle. And is an HTTP request an example of a particularly complex one? Are there harder ones that you've had to support? I'm thinking, okay. There's random numbers. There's ports. There's viewport queries now in Elm 19. Ar