What Every Software Engineer Should Know About Documentation

November 01, 2017

Ask any software engineer, what is a big issue you face in your day-to-day? More often than not, they will vent and say: "I face poor documentation, and finding information to get my work done."

I'm sure this rings true for anyone who has experienced interfacing with external API's, or someone in a large tech organization or who just recently joined a legacy project.

But why would this be? "Doesn't everyone know that they should at least write readme? I thought wikis are trivial to setup nowadays?" This is perhaps true in select cases, more often than not a more common occurrence is a developer crunched on time will simple focuses on shipping features.

Codebases which exhibit substandard documentation, over time will arguably lead to poor user experiences, wasted engineering effort and at the worst, product failures.

Given the potential impact of bad documentation your probably thinking, "Well ok, what are some ways I can improve my docs, ASAP?" But, before we get to what it takes to create good documentation, let's start with identifying the problems most projects docs end up suffering from.

Common Issues with Documentation

Documentation, if even present for a project or codebase, often suffers from a number of issues:

  1. It is easily susceptible to bit-rot. Meaning over time, what the docs describe will become misaligned with what the codebase actually does. Documentation may genuinely be a source of truth when written. However, as underlying abstractions change, interfaces improve, and different developers come and go, docs if not automated and linked directly with the source code, will become stale and decoupled from the codebase.
  2. Often will have poor coverage of needed topics. The person writing documentation is not always the eventual reader of said information. For example, in the process of a developer writing documentation for a feature, they might describe in detail, the overarching architecture they have abided by. However, they may be completely overlooking the more poignant debugging processes or assumptions the code makes. This information is arguably far more useful and relevant in the day to day, for let's say, a developer, scrambling to find out why the bug only shows up Sunday midnight.
  3. Lacks empathetic writing, where you write for the context the eventual reader. Empathetic writing is hard. It requires you to understand at a future time what the reader will be eventually looking for. What problems they are trying to solve when reading your docs. And many questions you may not have the answers for when originally writing the docs.
  4. It's hard to budget for, capture and write down during the day to day. Practically all meaningful code lives within constraints such as deadlines, budgets, etc. Often, some of our day to day tasks could have been captured for later reference, but unless you set aside dedicated time and money for the sole purpose of capturing and writing documentation, these small events will be lost in the flow of development.
  5. Even if its described somewhere, when you have a lot of docs, it becomes a discovery problem. Run books stored in random file paths. RFC's stored on google drive. Bugs tagged in JIRA. Sprints described in Phabricator. Each silo describing a different aspect of the project. Without standards, it takes a deep amount of searching to find and piece together the needed information. And searching is definitely hard.
  6. If people find it, they usually don't read it. People don't read on the web, they scan, and as such, its freakin' hard to direct peoples attention toward whats important. So even though it feels like a topic should be long winded with lots of details. Usually, this will work against you, and readers will just scan right past your well-written prose. Ex. This very paragraph is less likely to be read, compared to the bolded first sentence.

Keep in mind this list is by no means exhaustive and you probably have plenty of other examples which come to mind, but this should at least bring to mind this issues so when your next looking at your projects docs, you can potentially spot similar instances of these issues.

Now that we have a list of the various issues we often come across when dealing with documentation, let's begin to list out what good documentation would preferably entail.

What Successful Documentation Looks Like

  • It's poignant, contextual, clear, and brief. Good documentation provides context during the first contact. It provides support and actionable information to a user when they most need it. Any info is surfaced as compact actionable info. And if something really has to be wordy, it is written via teaching not telling.
  • It shows processes and checklists required for every aspect of the product. Oh, you don't have checklists? You feel your environment is too chaotic to be defined by checklists? Please do your brain a favor and create some. You will be amazed at the reduction in effort, time, bugs and plenty of other good effects that come about after you start using checklists.
  • It's preemptive and solves common issues before they happen. Look at your run of the mill FAQ. Many of those questions are begging to be automated away and can at the least be partially solved. Good documentation highlights these areas and provides frictionless ways to go about fixing common issues.
  • It understands that users probably won't read it. In fact, most people just scan, and the way to get meaningful information across is a strategic and generous usage of chunking and variation. So, what will readers actually look at?
    1. Meaningful text and images
    2. The beginnings of paragraphs(first 3-5 words)
    3. Bullet lists
    4. Variations in typeface (Links, bold, etc)
    5. Code snippets

Wrong Docs < No Docs

At one point I enjoyed taking this stance when it comes to documentation.

Documentation is like sex. When it's good, it's very good. When it's bad, it's better than nothing

However, after wasting time reading outdated docs and being burned by incorrectly listed build steps, I have come to the conclusion that bad documentation is frequently worse than no documentation at all. If the documentation says one thing but the code does another, then you have just wasted your precious time reading useless material.

And just like the scenario where losing an end user's trust will forever mark your product, so to will losing trust in the documentation have a lasting effect on developers.

Heres my take on what kinds of documentation tend to be the longest lasting in terms of being faithful to the underlying code. Alternatively, its documentation ranked from the smallest to the largest number of times I end up saying "wtf" after I use it.

Documentation, ranked from most accurate, to absolutly guaranteed to be wrong.

  1. Reading the source
  2. Reading the tests
  3. Inline comments or docstrings
  4. API docs
  5. Architecture diagrams
  6. Wikis

Documentation as a productivity multiplier

Over time a project will have a growing amount of files, folders, classes, and other ancillary material. Becoming an unwieldy digital mass to be waded through.

Each different document holding answers to the various questions you ask yourself on a day to day basis, in order to get work done;

  • Where things are located
  • How events propagate
  • How test cases are setup
  • How configuration is handled

The time to answer these questions becoming longer and longer as all of these little bits of knowledge have to be internalized.

Over time the mental lookups required to remember this knowledge becomes a seriously limiting factor to productivity. I would not call this technical debt per se, but inside I like to call it documentation debt.

With this issue in mind, it should be obvious how documentation becomes the external brain for the project. The task of documenting all this knowledge should be prioritized for it needs to be completed in order for a project to scale out to more than just a sole developer.

Ideal Meta Information to Surface per Document

  • The timeline/histogram of updates to a document. Most wiki's show only how long ago a document was updated. This is a good starting point to understand at what point in the lifecycle of the document are you reading it at. However, this statistic is not at all useful on its own. Even more important, is the string of edits and how they are distributed. If the document is being periodically updated in chunks then that might imply the document is in good health. With its info being updated as versions are released, fixit weeks are tackled, etc. However if the edits are only all clumped together toward the beginning of its history, and only a few minor edits recently, then it might be a sign the document has been neglected and its information might be outdated. A solution to this takes to form of some activity chart. ex. Github's pulse chart
  • Key points in its edit history. Combining the edit timeline, with externally available events, such as major project milestones, outage incidents, team member changes, etc. This information provides some of the best context available when trying to understand how the document has evolved and when it was updated in response to events. Also equally important is to mark where the document was not updated. One example is from the finance domain. Ex. Google's finance charts include news points inside of the price chart's timeline.
  • Links back to specific files, concepts, files, etc. which are directly referenced or discussed within the document. Tracking project changes in lockstep with the referencing documentation is the single most important thing you can do to maintain the health of your projects. To have a robust ecosystem of documents there needs to be automation wherever possible. The first steps to this end are tags or links between the documents and the digital material they discuss. Sometimes this is provided for free. Eg. automatic docs generated from source comments, mocked testing interfaces via automatic parsers, etc. However, usually the most important things, such as high-level concepts, important characteristics, usage patterns, basically anywhere there is not a trivial mapping between a single file in the project to a concept, these areas are really hard to automate, but even more important to document.
  • Common search queries, developers, user's managers, etc ask. Just like how a blog will optimize its meta information and its content to target specific queries its audience might be searching for, a project's documents should do the same. To build toward this documentation should ask for quick feedback, on what the users were searching for, basically a prompt asking, "What were you intending to find?" and perhaps a yes/no "Did you find it". A more AI based implementation of this is a document should be aware of what queries this document could answer.

How to Tell if your Documentation is effective or is improving?

So you have your readme files all updated, and in sync with the codebase. You have run books for the various system management tasks. An onboarding guide might even have been given some kudos by the newly hired engineer. But how do you objectively tell if your documentation is working? Or getting better? Or on the flip slide, if it's slipping into despair?

  • Average time to resolution If you're tracking bugs, outages, etc, you can calculate how long certain issues take to resolve. Tracking the durations over time and smoothing out the data for the noise. Now at the very least, you will have a record of your outages, perhaps a pretty graph for a dashboard and hopefully a good benchmark to track overall improvement in your processes.
  • Time since last view and duration between subsequent views As docs get written, certain documents will be used far more than others, probably following the Pareto Principle. This usage could be captured as views across time. The usefulness coming from looking across the docs in aggregate and relative to each other. Trying to minimize outliers, the docs not accessed often, would be a useful goal. Also in identifying the content being accessed in quick spurts, you can measure similarities, and try to optimize for grouping and linking similar docs.
  • CSAT, and other satisfaction scores. Only useful if your able to gather a large number of responses, but the wealth of information on how to measure and improve these types of metrics is such that, you will have no trouble finding countless ideas on how to implement them.

For further Reading


Matthew Clemens © 2022