Haskell Refactoring
When we write Haskell, we love to talk about the correctness of our code. The focus on using equational reasoning let's us reason about our code. This is amazing thing to have in our toolbox when we have a piece of code we want to analyze, but what about when we want to change some piece of code?
We often use the term refactoring, to loosely mean "changing a program", but it's original definition requires that the new code is functionally equivalent to the old code. If we were model this in Haskell, we'd get the following:
data Program = Program
run :: Program -> Input -> Output
type Refactoring = (Program -> Program)
-- A given Refactoring r must satisfy the following property
-- run program input ≡ run (r program) input
If we require our refactorings to have this property, we know that we did not accidently introduce a bug (or accidently fix a bug!) whenever we apply it. This means it's always safe to apply a refactoring, and we don't need to think about the correctness of it, only how it impacts the design.
Refactorings also compose:
a :: Refactoring
b :: Refactoring
c :: Refactoring
c program = b (a program)
-- equivalent to
-- c = b . a
-- We still satisy the property
-- run program input ≡ run (a program) input ≡ run (b (a program)) input
Because of this, we know we can apply any number of refactorings to our program, and we'll never change the correctness of it. Feasibly, if we could make huge changes to our program safely by defining a set of refactorings to get us what we wanted.
Types of Refactorings
Much of the refactorings defined has been done in languages like Java or Ruby, but they really aren't language specific. Almost all of the refactoring catalog apply to Haskell with some slight modifications (e.g. use function instead of methods and typeclasses instead of interfaces).
What makes Haskell really interesting from a refactoring standpoint, is that every equivalence we have is also a valid refactoring. For instance, in the documentation for id
, there's the following line:
id x = x
This is the definition for id
, but also an equivalence. I know that whenever I see id x
I can replace it with x
. I also know that for any arbitrary expression x
, I can replace it with id x
and it will work.
Various libraries will define these, and they are useful to have when refactoring.
An Example
Let's consider a small module that we want to refactor to use lens
module User (
User(..), hasValidEmail
) where
import qualified Email
data User { email :: Email.Email }
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . email
is documents a lot of the properties in the comments of the function. For example, view
defines the equivalence view . to = id
Let's use this to refactor hasValidEmail
to use lens
-- id x = x
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (id email)
-- view . to = id
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . ((view . to) email)
-- (f (g a)) = (f . g) a
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (view (to email))
-- Extract Function
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (view emailGetter)
emailGetter = to email
-- `view` only requires a `Getting`, so we can
-- use a `Lens` instead of a `Getter`
-- https://hackage.haskell.org/package/lens-4.16.1/docs/Control-Lens-Getter.html#t:Getting
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (view emailGetter)
emailGetter = lens email (\u e -> u{email=e})
-- since `emailGetter` is no longer a `Getter`, rename it
-- Rename Function
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (view emailLens)
emailLens = lens email (\u e -> u{email=e})
That was a lot of work for a relatively small change! But, since each change was just applying a refactoring, we know that we never broke the code.
Combining Refactorings
Doing that every time is really tedious, but we don't need to work at the level every time. Once we have a higher level refactoring combining multiple steps, we can use that as a refactoring itself.
If we wanted to make our entire program use lens
, here are the 5 steps we'd need to do:
1. Add lens for a record field email
emailLens = lens email (\r a -> r{email=a})
2. Change every use of the email
field as a function to use the lens
email = view emailLens
3. Change every use of the email
field to assign to use the lens
r { email = v } = set emailLens v r
4. At this point, there should be a single consumer of the record field: the lens. We can rename it to _email
emailLens = lens _email (\r a -> r{_email=a})
5. With the email
identifier no longer in use, we can rename our lens.
email = lens _email (\r a -> r{_email=a})
With that, our code is as if we designed it as a lens from the beginning, and we have a high level process to do this in other places.
More Reading
The Refactoring book is good, and is applicable to Haskell development even if the examples aren't in Haskell. Neil Mitchell has a good example on doing this for a function.