Point of Story Points

How many times you heard in agile planning meetings that a similar user story has been done before, so the user story in question can benefit from that experience and can be assigned a fraction of the original story point value? If the answer is more than once, chances are, your planning meeting are long and contentious and past velocity rarely comes up when you are filling up the next sprint.

Staying Objective

Story points are supposed to be an objective and quantifiable measure of work amount and complexity. Having such measure, one can approximate the time required to complete work using the skill level of a person assigned to do that work. Sounds simple, except that in software development work may be somewhat tricky to represent in tangible units.

Work Amount

Let's consider a hypothetical job that is easier to visualize than a typical software development activity. Imagine that we need to salvage bolts from old car chassis and for each chassis type we know how many bolts we can potentially salvage. That number is our starting point for defining story points objectively.

Let's say a chassis of type 'A' has 100 salvageable bolts. A senior worker can remove 50 bolts per hour, so they will be done in 2 hours. A junior worker can remove 25 bolts per hour, so they will be done in 4 hours. In either case the amount of work remains a 100 bolts and is not affected by the experience of a worker or even whether one or more people are doing this work in parallel.

Knowing the velocity of each worker on this team, we can plan how long a chassis of type 'B' with 200 bolts and a chassis of type 'C' with 400 bolts would take to process and how many chassis of all types we can process in a week-long sprint.

Work Complexity

In reality, many types of work cannot be measured by counting neatly organized work items. In our hypothetical scenario, chassis would come from a junkyard and some will have more rust than others, some bolts will be wrecked beyond being salvageable and some will be painted shut. A rusted chassis of type 'A' still may have 100 bolts, but now workers have to apply additional techniques to loosen up bolts and some will be rendered unusable in the process, resulting in increased complexity and reduced output.

Now we can no longer just count bolts to estimate work and need to factor in complexity, such as how much rust there is and how damaged a chassis looks at the first glance. The net effect is that the same 100 bolts may be estimated as 100 points or 150 points, depending on these additional non-quantifiable factors.

Non-quantifiable factors are harder to evaluate and require more experience to estimate, so a round-table discussion may be not as productive as one would hope because experience brings more subjectivity and more conflicting opinions from team members.

Velocity

Velocity is where work amount and complexity meet skill and team composition. The former should be fairly evident at this point, but the latter may be a bit more subtle and is still confused by some as a team performance indicator of sorts.

Consider that in our hypothetical salvage operation team of 1 senior worker and 2 junior workers, the senior worker was replaced with 2 junior workers, with the thought that the velocity will remain the same, but it may turn out that the 2 junior workers on the original team were able to salvage 25 bolts per hour because the senior worker provided some guidance and without that guidance each junior worker can remove only 15 bolts per hour, so the resulting velocity will be much lower.

Velocity numbers are only meaningful within a particular team that produced these numbers and for the kind of work the team was focused on in a little while. Changing work area, work process or team composition will change velocity and it will take some time for the new velocity trend to emerge.

Using velocity as a team performance indicator in any shape or form will just tempt the team to change their story points to improve velocity, consciously or not, and will negatively impact the quality of estimates based on these such story points. Sprint velocity is a planning and forecasting tool and will not do any good for measuring team's performance.

Déjà vu

Back to the we-have-done-that-before point. When a team member goes through the same amount of work faster than in previous engagements, it is not because work amount or complexity somehow got smaller, but rather because they acquired more skill and can work more efficiently.

Work amount is simpler to visualize because it usually translates into some units of work, such as number of files to process, number of components to introduce, and so on. Work complexity, on the other hand, is based on the currently adopted work process, which may be quite hard to visualize in any measurable units. However, neither amount nor complexity changes when a more skilled worker does the work.

Work amount may be changed only by removing work items. For example, if we wanted to allow emojis in data backup titles of some data backup product and needed to change several backup media drivers to accept UTF-16 surrogate pairs, we could decide to drop support for tape backups in the next release, so we could exclude the tape backup source and reduce the amount of work.

Work complexity may only be changed by changing the work process. Changing application architecture, replacing in-house components with 3rd-party integrations, introducing scripts to replace manual deployment checklists, and so on, all change work complexity. For example, replacing build VMs with containerized build environments makes dependency configuration work simpler and dramatically shortens the deployment time.

Doing similar work in another user story may result in better velocity, based on whether there is room to improve team skill in that work area, but it does not change work amount or work complexity.

Combined Complexity

Software development is hard to quantity. Most well-designed applications reuse components and new user stories often require implementation that has not been done before within the organization, even for solutions that take full advantage of 3rd-party components and have a fair amount of boilerplate code involved.

For example, we might want a new feature for a discussion board site that would allow users in a discussion thread create a breakaway room for a private conversation. This feature could have a user story along the lines of "Users participating in a discussion thread shall be able to create a private breakaway room for participating users".

After discussing requirements and possible solutions with the team, a team lead might create two tasks along the lines of "Implement UI to allow users create a breakaway room and invite other users" and "Implement a set of REST API endpoints to manage breakaway rooms".

The first task would consist of a series of steps that have already been done before for other UI features, which makes it easier to estimate required work and use the same number of story points as was used for any story implementing UI of a similar size.

The second task, however, was never been done before, as is the case with most software development towards new product features, and from the new code perspective is much bigger than the first task. It will also likely require a more detailed solution to describe how the existing discussion thread mechanism should be used for breakaway rooms, how these rooms should be maintained to be visible only to their participants, how the new configuration should be stored in the database, and so on.

This is where the water gets muddy. We now have two tasks contributing to the complexity of the parent user story, but there is no clear way to combine those tasks because each solution defines its own complexity scale that cannot be compared to the other one.

One logical approach here would be to track work with two different user stories, not tasks, so each can be estimated independently. One for the backend team to implement a set of REST API endpoints to manage breakaway rooms, and one for the UI team to build a UI using new REST API endpoints. Each team would use their own definition of work complexity based on their solution, so each user story gets story points derived from a specific solution, making story points in each case relevant for future planning.

However, independently deliverable user stories may be harder to implement and verify because neither can be truly tested without the other one completed. Some teams use mocks and dark features to get around that, but it is a rather large discussion for another blog post.

Combined work amount and complexity produce somewhat arbitrary story point numbers because they account for more than one loosely-related work activity expressed in tasks for such user story and each task may be assigned to a different team member with a different skill. This will make effort estimates less accurate, so if one of such stories could be implemented within a sprint before, it does not necessarily mean the next one with the same story point value will fit in another sprint.

Keeping user stories focused on one aspect of user-visible functionality makes them more predictable and easier to work with, but sometimes trying to accomplish that will present challenges that are hard or even impossible to resolve.

Design by Committee

Sometimes agile planning meetings turn into hours of discussions and voting poker, which is not surprising, considering that more often than not these meetings take place before a solution is sketched out for each user story being considered for a sprint.

There is a lot of conflicting guidance in various agile blogs as to when a solution should be available throughout the user story lifecycle. Some suggest during backlog refinement, when stories are estimated, some caution that an early solution may prevent developers from being able to apply a better one during implementation and some suggest that a dedicated design session in some cases. It is agile after all, so nothing is set in stone.

The reality is, however, that if somebody asks you how to get to Main Street without any implied context (e.g. you got a text with this question), you will ask where they are departing from (starting and destination points form a functional requirement), whether there is a desirable time and cost for a trip (a non-functional requirement) and whether they can drive or prefer walking (a constraint), then you would come up with a solution route in your head, and only then you would estimate that it will take them about 15 minutes to get there. Requirements leading to a solution, leading to an estimate, sound like a reasonable sequence to follow, but in agile many try to come up with estimates using partial requirements, sometimes even just those couple of words in the story title, and whichever opinion passes as a solution in a planning meeting.

Another common thing in many agile publications is that magical agile team that has members who can design, estimate, implement and test things equally well. In real life, agile teams span a huge spectrum, from startup company teams, where everyone is an expert and is driven to keep their company on the path to success, to large teams, where everyone is well-specialized in their area and may switch only in adjacent areas, to even larger teams, where product well-being may not depend on its technical quality, where technical leads are appointed, where some developers are eyeballing one of those lead roles and push their ideas whether they make sense or not, while others are just trying to learn new skills for when they need to move on, and some are just trying to do their job the best they can. Needless to say that planning meetings will go drastically different in such teams and failing to recognize these differences and adapt will be evident in long and fruitless planning meetings.

From my highly subjective experience, I find that skill and experience do matter and having a solution written by an individual collaborating with others at their discretion, discussed at their level first (e.g. among tech leads, among developers, etc.), then discussed at the team level and then reviewed for accountability at some point after it has been in production for a bit, works best for product quality and development efficiency. In other words, absolutely everybody should feel welcome to share their ideas and opinions, but at the end of the day the decision lies with the person who will be accountable for the result. Your mileage may vary.

Relativity

Yet another common online agile advice is to treat story points as a relative value compared against other user stories. On the surface it makes sense, as some stories may be more complex than other, but in practice this is a quite counterproductive advice, where inevitably you run out of relative levels and end up having to shift a bunch of already-estimated stories to squeeze in a new one.

Some try to remedy this problem by arranging stories by their perceived complexity first, without giving them estimates, so it is more organized, but grouping this ordered list into story point bands is a tedious and expensive exercise and it just delays having eventually to re-estimate stories.

If you were to ask a developer in a hallway for a casual opinion about some feature, no pressure, they would say something along the lines of "not sure - need to look into it" or "oh, that's a big one" or "I think I can do it reasonably quickly" or "that's a couple of lines of code", and so on. Using T-shirt sizes to estimate stories rather than numbers does just that and then the number associated with each T-shirt size can be used as a story point value.

Using T-shirt sizes eliminates the notion that a numeric story point value defines some precision for work complexity, which tempts people to come up with more elaborate estimates. Using size names makes it easier to keep stories in their size band.

Conclusion

Estimating work complexity in software development is tricky and many end up estimating work amount and complexity as how hard it would be for them to do the work, which effectively yields an estimate for effort expressed as a story point value.

Story point values with mixed-in effort are misleading for forecasting and they make planning meetings contentious because people, effectively, end up discussing whose effort estimate is more correct, which is quite meaningless, because for each person their effort estimate is correct if they were assigned to work on that story.

Many consider story points as an inseparable part of agile development and keep forcing it on their team regardless whether it works or not. In my years of agile development across several teams I learned that if a team cannot agree on story point estimates sprint after sprint, it is more efficient to forego story points altogether and just estimate effort in story tasks rather than to waste time trying to reach a consensus on whether it is an 8-point or a 20-point story when there is no common scale for evaluating work amount and complexity in the team.

All in all, if your velocity is consistent from a sprint to a sprint and you can forecast when it will go up or down and if your team members don't roll their eyes when they head to a planning meeting, you probably wasted your time reading this post. Otherwise, it might be worth having another look at how your user stories are estimated and whether the current process is truly working for your team or you are just following the agile checklist, hoping that one day some of the steps will make sense.

Comments: