Introduction

*Ryan Cheley (He/Him/His)

* Senior Regional Director of Business Informatics
* Director of Engineering
* Djangonaut Space Navigator
* Admin of Django Commons
* One of the Maintainers of Django Packages
Introduction

* Husband
* Father
* Sports Fan
How to find me

* Website: https://ryancheley.com/

* Mastodon: https://mastodon.social/@ryancheley

* GitHub: https://github.com/ryancheley/

* LinkedIn: https://www.linkedin.com/in/ryan-cheley/
Error Culture
What the heck are all of these emails I get for
anyway?
Definition
Alert Definition

a warning of a danger, threat, or problem, typically with the intention of having it avoided or dealt with.
Assumptions
Assumptions

* Done via email
* Are automated
Conversation

This talk is really a way to start a conversation … with me right after the talk

During the conference

or once you’re back at work next week at your organization

What is it?

Specifically, what it is

Why it happen?

Why it happens

When it Start?

When it starts

Who it happen to?
Am I in it?

how to tell if you're in an organization that suffers from it

How do I get out?

It may not be as hard as you think!

What is Error Culture?

How many of you have heard the term ‘Error Culture’ before?

How Many of you have heard
the term Error Culture
before?
A culture that accepts error
notifications and ignores
them, encouraging a reactive
fire fighting culture, instead
of proactive culture of
problem solving
Is that Bad?
YES!
Why is it bad?

* Low Signal to Noise Ratio
* Wait until the & hits the fan
Why does Error Culture
happen?
Why does it happen?
* Lack of Understanding
  * What
  * Why
  * Who

Lack of understanding of

Why does it happen?
*Error/Alert Fatigue
Why does it happen?

Gif of rocket ship with an OK button that is clicked on 8 times

Have you ever had to click ‘OK’ a bunch of times? Have you ever had users complain about how many times they have to click OK?

Why does it happen?

* Hero Culture

Potentially most insidiously …Hero Culture

Two people each with a fire extinguisher noticing a small fire

These next few slides have one of my favorite comics from

The Work Chronicles comic

“Prevention and Cure” https://workchronicles.com/prevention-and-cure/

we find our hero finding a problem

One person puts out their fire with their fire extinguisher, while the other watches their fire get bigger

we find our hero watching the problem get bigger

The fire gets bigger still while the second fire is completely out

and bigger

A raging inferno from the fire that was watched to get bigger

our hero tells everyone about the problem that they have ‘found’

our hero fixes the problem

OUR HERO

our hero is recognized for their efforts

OUR HERO

How many of you have ever been the person the the LEFT?

How many of you have been the person on the RIGHT?

Which one feels better? The one on the RIGHT

which one is actually better for problem solving? The one on the LEFT

When does Error Culture
Start?
* Internal
* External

Two main classes of reasons

Internal

* WE need to be notified when THIS happens
* MIGHT be useful
* Opted In

WE need to be notified when THIS happens …

This alert MIGHT be useful …

Opted In … Perhaps you’re sent an alert of an error but there is no context, or missing context

External

* Best Practice
* Default Enabled Alerts

When a consultant indicates that it is ‘best practice’ to be notified of an alert but doesn’t provide more context. This is similar to the WE need to be notified about THIS from the internal section before

When defaults for external software come with enabled alerts but no context or steps for resolution

Who does it happen to?

You might be surprised at the answer … or maybe not

People in Tech

* Developers
* Help Desk Folks
* System Administrators
* Network Administrators
* Directors of Engineering
* Chief Technical Officers

Since we're at a tech conference, the obvious answer is folks in tech. This can be ....

Office Workers

* Administrative Assistants
* Office Managers
* Customer Service Representatives
* Account Managers

but you might not realize this has the potential to happen in other areas of life as well.

Sectors / Industries

* Healthcare
* Education
* Agriculture
* Hospitality
Anyone!

Honestly, this can happen to anyone!

Am I in it?

How can you tell?

Ask yourself a few questions
An email trash folder with lots of unread emails from a no-reply style email address

Does your deleted inbox look something like this?

With a whole bunch of items from a no-reply style email address?

Question 1
* Is your deleted items filled with lots of emails from no reply style email addresses that you didn’t even read ... you just deleted them?
Rules Wizard from MS Outlook showing a rule set up to automatically delete emails from no-reply style email addresses

But we’re all smart people in this room, so maybe you get 'smart' and create a rule to get rid of that email so you don't have to see it any more?

Question 2

* Do you have a rule that just deletes emails?
An error message that has no information or context for why it's important, who it will impact, or how to fix it. A reference to a client library and IP address are made without telling you which ones

Maybe you get alerts with no context that are NOT actionable

Question 3

* Do you get alerts and have no idea why or what to do about them?
OUR HERO

Do you have experiences similar to the one we saw in Prevention and Cure?

Question 4
*  Are people rewarded for waiting until problems they knew about are big enough to alert everyone about and then resolve them?

Stated Another way:

Do you see others around you put out fires that you BOTH knew were coming

and did nothing until the fire got BIG enough to let EVERYONE know about

… and then they get ‘rewarded’ for putting out the fire?

If you answered yes ...

to one or more of the questions from before

You're in an Error Culture
Convinced

Hopefully I’ve convinced you that Error Culture is bad

How can I fix it?

And you might ask …

Good News!
Good news!

* Individual Contributor
* Chief Technical Officer

No matter where you are in the 'ladder' at work (i.e. IC, or CTO) you can make a change

You can have agency

Where to start?
CHESTERTON'S FENCE

Two people on one side of a fence with a Rhinoceros on the other side in the distance. One of them proclaims the fence to be a dumb one

A word of caution … change should not be made

until the reasoning

behind the current state of affairs

is understood

And how can you gather information to understand?

Ask Questions
Is the Alert Important?
NO
Delete the Alert

* But not JUST the alert
* The mechanism that generates the alert

Because we don’t want ANYONE to have to delete this alert

Is the Alert Important?
YES
Important Alert!

We have an important alert

Is the Alert Actionable?
What does an Actionable Alert Look Like?
School House Rocks super hero Verb standing in front of the work VERB

The superhero Verb from Schoolhouse Rocks

School house rocks was a short cartoon on between other cartoons on Saturday mornings in the 1980s

Examples
Bad

Subject: Super Important Alert
about the Server!

Message: The server is
unresponsive!

Which server?

Better

Subject: Super Important Alert
about the Server!

Message: The server do-web-
005 is unresponsive

We know which server now, but what am I supposed to do about it?

Best

Subject: Super Important Alert
about the Server!

Message: The server do-web-
005 is unresponsive. To resolve
this **do** X

an actionable alert should have a verb in it ... i.e. the server is unresponsive. To fix this, do X ...

The verb here is do

Best

Subject: Super Important Alert
about the Server!

Message: The server do-web-

005 is unresponsive. To resolve
this REBOOT the server

an actionable alert should have a verb in it ... i.e. the server is unresponsive. To fix this, do X ... The verb here is reboot

Actionable Alert!

we have an actionable alert now

Why ...

does the alert exist?

we have an actionable alert now, but do we know WHY we have the alert? If not, we should determine the WHY and document it

Why ...
Is it important?

knowing why an alert exists can help you to determine if it's still needed in the future

Best

Subject: Super Important Alert
about Server!

Message: The server do-web-005 is
unresponsive. To resolve this
REBOOQOT the server

See this link for details on the alert.

Here, we’ve added a link to our Knowledge Management System to help provide context for the alert

Example Link

The server do-web-005 is a test
server on Digital Ocean. It is used for
project ABC which is set to be retired
on January 1, 2025

Since today is February 8. 2025 …

maybe this alert isn’t important anymore

But I’d need to verify before disabling the alert

Example Link

The server do-web-005 is a production server on Digital Ocean. It is a mission critical server for claims adjudication

oh no ... drop everything and get this taken care of now!

Alert Context

* Link
* Embedded

In my examples the context was provided by links …

But embedded context can work as well

No one size fits all

Who ...

* Should be notified?

make sure the right people are being notified

Best

Subject: Super Important Alert
about the Server!

Message: The server do-web-005 is
unresponsive. To resolve this reboot
the server

See this link for details on the alert.
Example Link

The server do-web-005 is a production server on Digital Ocean. It is a mission critical server for claims adjudication
Are these the right people?

* Claims team
* Business Analyst
* Developer

The Claims team and the Business Analyst can't do anything; given the security infrastructure, the developer might not be able to do anything either!

This might be good information for them to have, but sending an actionable alert to the wrong people doesn’t help anything

Right People

* Server Administrator

this is a person that can actually perform the action of do from above

A Venn Diagram showing the areas of importance for an alert; Actionable, Important, Sent to the right people

Frustration -> What am I supposed to do with this information?

Time Waste -> Why did I just do this?

Confusion -> what am I supposed to do?

Conclusion
Pervasive

Error Culture is / can be pervasive.

Make it better

But you can make it better

Ask Questions
Make Sure that your Alerts are

* Actionable
* Important
* Sent to the Right People
Questions?
Find me

Blog
https://ryancheley.com/

Mastodon
https://mastodon.social/@ryancheley

GitHub
https://github.com/ryancheley/

LinkedIn
https://www.linkedin.com/in/ryan-cheley/