The following document is a collection of my accumulated observations during my time as a software engineer. The content is relevant to web-based, consumer applications, but can apply to software development in general. The length of one of my observations, ranges from short one-liners to sections with detailed prose.

As a disclaimer, I will not claim that anything written here hasn't been written before. What I will claim, is my observations were written only when I felt an experience provided a hard learned lesson or counter intuitive result worth trying to explain. Which I think provides a high signal to noise ratio for those of us looking to learn and jump ahead of our peers.

Also, I have tried to write as clearly as possible but I make mistakes all the time. So if something is confusing or incorrect, feel free to reach out.

I also have to forewarn you, I've been told my writing style is kind of weird. I tend to write in the first-person plural. Occasionally, I switch it up and write as if my future self will have long forgotten the original experiences.

So in no particular order, I present, my observations I have made up until now, on software engineering & development.

Production & User Experience

A user's first interaction with our product will form a strong initial opinion. This initial opinion is very difficult to change and will influence every subsequent interaction. [Source] Therefore, during each iteration of our product development at some point we must deploy additional code, launch new features, or release something new.

It is at this time, we must remember how initial impressions will have a large impact and take the necessary steps to mitigate bad first impressions and other risks.

One naive solution is to avoid early first impressions. Perhaps holding off on releasing a product until we know with 100% certainty it will be effective or bug-free. But to actually follow that, would be insane. It would prevent us from enjoying the benefits of a short iteration cycles, constant experimentation, and important early feedback.

So to be clear, I'm not advocating for us to go completely out of our way to avoid regressions, hold back product launches, or even track a count of bugs. Instead, I prefer to base our goals on a more time-based metric.

The Goal

A more realistic goal is to look at the process required to resolve bad updates to our product. And then try to minimize the time required for this process to complete.

As a concrete example, let us look at the process involved to fix a software bug deployed in production. This process is made up of multiple steps. With each step potentially being difficult and time consuming. It usually encompasses the following:

  1. Detection . Enabled by observability processes i.e logging, traces, and metrics.
  2. Mitigation . Enabled with a site reliability mindset i.e. automation, notifications, CI, and virtualization.
  3. Communication . Enabled via semi automated people processes notifying those affected by the bad updates. i.e. social media relations, SLA dashboards, internal bug trackers, etc
  4. Resolution . Enabled with productive developer environments and tooling i.e. fast code search, test coverage goals, and code reviews.
  5. Debrief Enabled by an organizing with a learning mindset i.e. postmortems, case studies, and knowledge sharing.

If your already familiar with this process I hope I didn't offend you with that list. I am probably missing some steps as I can't possibly enumerate nor experience everybody's bug-resolving processes. These steps are only those which I have encountered frequently.

Do notice, that I listed some steps which may not directly affect the end user, such as the debrief, but I still take them into consideration. My justification for doing so is in the event that we have multiple bugs on going, time spent in the debrief step will take away from the other user facing steps in the other concurrently occurring bug. [Fish Diagram]

The key takeaway and the mantra I remember throughout any project is thus:

I must minimize the time users are exposed to poor experiences.

Preventing poor experiences

First off, what are we even trying to prevent? Just what is a poor experience?

From a user's point of view, a poor experience can simply mean the app is hard to use. To a UI/UX designer, maybe it is defined as the user's mental model ending up convoluted or differing from the one the designer intended. To a product manager, it might be defined as a underutilized feature not gaining traction. To an engineer, it might be defined as a user encountering a bug.

Just as varied as the definition of a poor experience, so are the instigating factors or mistakes we make that cause those poor experiences.

Here are some of my own past mistakes which I know contributed to poor experiences:

  • Buggy code making it all the way into production.
  • Writing poorly constructed body copy and help messages.
  • Building features in isolation.
  • Planing and allocating time on misguided features.

Each of those mistakes could have been be prevented. Here are a few of my most used and consistent ways to prevent the aforementioned mistakes and consequently, poor user experiences:

  • Consistent and automatic integration tests . Having a tight feedback loop to catch trivial issues up front has a cascading effect on the quality of production code.
  • Design Sprints. I enjoy the design sprint process as it can quickly test if a minimal product's core idea is any good. Plus I like the low cost of iteration it provides.
  • Thoroughly reviewed source code. When looked at by multiple sets of eyes, the knowledge embedded in my code is spread. Plus if anything goes wrong it tends to diffuse the blame and provides a sense of urgency to the reviewers involved.
  • Automated & Actionable monitoring . Quick bug detection requires having the right hooks for us to understand what code is doing the first time around.
  • And my favorite, rollouts to existing cohorts. If experienced users are unfortunately exposed to errors, they tend to be slightly more forgiving, sometimes even alerting us to problems.

To help remember this idea of preventing bad user experiences I amuse myself with the following joke:

Everybody has a testing environment. Some people are lucky enough to have a totally separate environment to run production in. —Unkown

Maintaining the Reputation of my Software

First impressions matter, first interactions matter, and personally, my reputation matters.

The moment a new user finds some piece of the product to be delightful, is the moment when trust is gained and I fill like our software's reputation can grow.

But make a mistake by providing a buggy experience and we are a world of hurt. It will take 10x the amount of good interactions to make up for a single buggy experience. It basically all comes down to having a positive reputation from the very first interactions.

When developing software to be interacted by an end user, which is all software, even server side software is still interacting with external developers at some point, we must consider the tradeoff of releasing a work in progress to our users.

To help me decide if this tradeoff is worth it, I keep in mind and ask myself a few key questions:

  • If release a bug, and a user encounters it, will I be diminishing the trust the user has in our product? Would the user doubt the reliablity of the feature even after it's bugs have been fixed?
  • Am I being upfront, explicit and clear about what areas are a work in progress? Have I used disclaimers, caveats and other clarifying messages to provide transparency to the user’s as they interact with the software?
  • Can I identify the most interacted areas of the product, where trust could easily be lost? Am I closely monitoring these areas? Have I considered UI, async tasks, and any other areas where a user’s expectation needs to be consistent, uniform and without friction?

Prepare the next generation.

We are all continuous learners. Each of us comes across or understands new things at different points in time. It is important to leverage our compounding knowledge not just for ourselves but for those which come after.

If I have seen further than others, it is by standing upon the shoulders of giants. — Isaac Newton

We should contribute and document our dead-ends, our failures, our successes, and all the other many time-consuming experiences we have faced.

Do not just tell why things are the way they are, show them. Weave a narrative that shows how we ended up with where we are at now. But don’t just stop there, also give thought to what we can become.

Ideally, in all of our projects we should have documentation be automatic or even a byproduct of the development process. Documentation that provides richer context, such as timelines and snapshots. Documentation that describe how events unfolded. Documentation listing the decisions which occurred for each new feature or decision along the way.

Make it easier for new comers to have an immediate impact.

Having an easily bootstrap-able dev environment is paramount for developers to get right into the thick of making changes to the codebase. No one wants to waste hours let alone days, in order to get everything up and running. Having installs be dead simple and automated is ideal. At the very least, simple copy paste procedures should be listed.

One of the highest leverage activities I do is to write scripts that make life more convenient for those around me. Being around software developers all day long has made me forget how most us don't yet know how to automate the day to day processes which are repeatable.

Fly Solo For Flow, Flock For Momentum

I have the most productive days when I have a trifecta comprising:

  • Long stretches of uninterrupted time in a safe environment.
  • A clearly defined, feature oriented goal.
  • A developer environment with tight feedback loops and minimal context switches.

I need to feel as though I am apart of something bigger. In order to feel connected, I personally need to see others who are right in the thick of it, just like me.

The type of team environment in which I thrive is one where:

  • We are in it to support each other in addition to our products users.
  • We proactively look out and catch each others mistakes. Then if a mistake is made, no one is going to chew me out after the fact.
  • We engage deep conversations on the various ways to improve our product as well as each other.
  • Where emotional intelligence is high across the team.

Digital Cleanliness

It is important to have both our physical as well as our digital spaces kept clean and cared for. We fix things when they are first noticed to be broken. We preemptively build up and solve known issues before major bad things happen. I take ownership of the things I can control and I don't fret over the things that I can't. If your familiar with the stoic way of looking at things, this is bascially the same.

Communicate.

Let everyone know what you’re up to: dabbling on production boxes, restarting things, deploying. Try not to fly under the radar. We’re here to support each other. No one is going to chew you out if you make a mistake.

Help one another out.

Review one another’s code in a timely fashion. Write small, cohesive PRs which are understandable and can be learned from. Keep pushing each other to do better. Call people out (nicely) when things could have been done better. Mistakes happen. It’s cool. Be open, honest and supportive.

Internal Tools - Enabling Processes

There is a set of tools needed for supporting software development. I call them Internal tools. Internal tools enable the myriad of required processes which drive every product forward.

They can be built or bought, but in either case, they should primarily be seen as a means to an end , where that end is a successful and strong product.

It only takes a small collection of tools, maybe 3 - 5. Each tool handles their respective tasks extremely well. I find that a small number of tools provides a good constraint and upper bound on the amount of time spent on fiddling with them and the eventual knowledge transfer between team members using different tools.

  • A tool to collect non automated documentation. This allows writing a RFC, PRD, Q&A, data schemas, runbooks, etc. Basically the all encompassing background info for each product or service. E.G. Google docs, notion.so.
  • An analytics dashboard. A single tool that enables observability across the product. Monitoring, business metrics, logs, etc. E.g. Grafana, jaeger, or google analytics.
  • A workboard for tasks, bugs, and roadmaps. Ideally has prioritization, tagging and ways to filter. Nothing combines all of them, so think something like asana, trello, or github projects.
  • Isolated and idempotent developer environments. So the local laptop, staging, production, etc. I view our developer environments as a tool in its own right, with my rational being a good environment can increase momentum and developer productivity.

Data & Discovery

Have a standard method for:

  • Dataset and schema listings - Define what is stored, SLA’s, where it’s generated from, who is currently using it. Who has read only, who has write access, etc.

Lock down employee data for a small subset of users, but don’t strip away the tools the rest of the company builds.

Data Context

When capturing data, whether it is storing user actions or storing business objects in some SQL database, there needs to be some surrounding context as to what is being stored, where it is accessed, how it is modified, and what is it’s play or interaction with the product currently is.

Bar Chart it up

img

PII

With the world increasingly focused on privacy and transparency of any data collected. Thinking about retention and purging processes from the onset is probably the best risk prevention strategy for any project.

Builds & Deploys



Common System Pain Points & Diagnosis

Monitoring

Logging

Stats

Engineering Competencies, A Manifesto

The competencies which make up this manifesto are broken into a number of distinct areas:

  • Foundational , meaning the non-negotiable principles which I must abide by.
  • Social and Collaborative , which involves my peers and the ecosystem which affects myself.
  • Operational , which I define to be developing, running, and maintenance of software.

The Required Foundation

I must have a hunger to learn, to explore new ides, and to gain new skills. I should improve my work over time, incorporating any new relevant knowledge to past work. The things I read or learn, should never be just trivial facts to be accumulated and amassed. Instead I should seek out knowledge to drive improvements in my future self.

I must go above and beyond understanding existing solutions by seeking out how and why the current state of the world is what it is. I should try to look at things from first principles whenever possible.

My time should be managed and prioritized, even when I encounter the complicated, hard to track down bugs which might pull me down a rabbit hole. When blocked, I must have the humility, no matter how embarrassing the situation, to ask for help which could unblock myself.

Things which matter most must never be at the mercy of things which matter least - Goethe

Socially Conscious

I must be considerate and resourceful with the intellectual resources around me. Meaning, I should be kind to my coworkers and respectively ask for their time when it is relevant for both of us.

I should be able to mediate disagreements or misunderstandings between the various coworkers and teams. I must conduct myself such that both sides of disagreements can trust my technical judgement.

When returning or providing feedback I should be constructive. Notice, I didn’t say positive, meaning, that if a situation arrises where I must be critical, I should not compromise the quality of my feedback for superficial reasons.

I must be responsive and able to learn from feedback. Take for example the common occurrence of code reviews. During a code review I should be able to incorporate any feedback into not just the current review, but each review thereafter.

Citizenship

I must attend engineering conferences, incorporate personal notes as much as possible and share my key takeaways to those around me.

I must actively mentor and deeply engage with my fellow engineers. My interactions between my coworkers should be comprised of kindness, empathy, and directness. My communication style should be based upon genuine curiosity and interest, where the desired outcome of my interactions is to build a relationship which allows my coworkers and I to improve.

I must produce and record tutorials and walkthroughs, write documentation and expository articles in in style labeled as classic prose. My documentation and the various things I write should enable other teams to leverage my work independently and without friction.

Knowledge & Docs

I must leave code, projects, and teams better off than when I first begain with them. I should improve the exisiting definition of best practices and contribute my learned experiences to a shared knowledge bank.

I constantly should be concerned with current and potential knowledge gaps. Any gaps should be spanned via documentation, training, and other knowledge transfer activities.

Designing and Novelty

I must conduct thorough research across various groups, domains, and pull together ideas and techniques across disciplines.

When contributing to RFC’s I will focus on: building upon first principles instead of dogma; inside the box thinking where my immediate resources and ingenuity are prioritized instead of increasing complexity or abstract brainstorming; Proactively engage in discussions around risk, contingency and tradeoffs, and always a pursue measurable goals.

Operational Considerations

Software must be written such that it satisfies the following rules:

  • Features are only added when there are clear and defined metrics which can measure the feature’s impact on the existing system.
  • Any code which must be written, is written with the understanding that developer and user time is a valuable resource.
  • New code must never increase the fragility of the system.

I should enable and execute large scale system and design changes in projects where there is a need for increasing efficiency, reducing code, enabling extensibility and long term momentum.

I need to anticipate outages, fail-overs, bad deploys, team and org level changes, and other technical and process related hiccups. I must take an anti-fragile mindset when it comes to anticipating future events.

Observability

Observability should be a focus in every project. Observability comes in three distinct areas: Tracing through external services; Metric collection and aggregation across features; Logging events and actions as they occur. Separate but equally important are the business and product operations dashboards to facilitate informed decision making.

Graphs-R-us

Not every problem can be solved by graphs, but damn near close. I have not yet had a mental model of mine not be defined or improved when it was visually displayed as a graph.

img

Good design makes a product understandable

Dieter Rams has these 10 principles of good design. The good thing about his principles is that he defines design so abstractly it can be applied to practically every domain we encounter. This includes the design of API's and software interfaces in general .

Semantic Versioning, A Heuristic for Expertise

Engineers Dilemmas

Engineers tend to tackle different a task each in their own particular way. Each engineer envisioning their best approach that ultimately ends up solving the exact same problem. In a way engineers, embody the proverbial hammer that looks around, imposing their individual biases, and shapes problems to look like a nail.


Matthew Clemens © 2022