Our code generates business value when it RUNS
a. Not when we write it.
b. People don't give us money to hang our code on the wall
We need to know what our code does when it runs
a. We need to MEASURE that.
Basic fundamental truth about reality: Map is not the territory
a. The map of SFO is NOT the City of SFO
b. The way that we talk about something is not the way the thing that it is
c. Perception is not equal to Reality
d. Both the map and territory are in a constant state of flux
There's a gap between what we think we know, and what we actually know
a. MIND THE GAP
We have a mental model of what our code does, but it's not actually the code itself
a. This is what we use when we work with our code, predict the effects of a change, track down that bug
b. OUR MENTAL MODEL IS OFTEN WRONG
c. When our mental model is wrong: it is confusion.
d. "This code can't possibly work" (it works)
e. "This code can't possibly fail" (it blows up)
There's no physical correlation for what we do
g. We don't have a off-kilter chair to remind us we're bad carpenters
Given two code samples
a. Which is faster? We don't know.
b. We CAN'T KNOW until we MEASURE IT.
This affects how we make decisions
a. How we allocate our resources, etc.
b. Do we play the guess-and-check game to figure out why the app is slow/broken?
c. Or do we instrument the system to allow us to make a better, data-driven decision?
d. Instrumentation allows us to make & improve our mental model that allows us to make better decisions.
We use our mental model to decide what do to.
a. A better mental model makes us better at deciding what to do.
b. A better mental model makes more business value.
BUT ONLY IF WE'RE MEASURING THE RIGHT THING.
a. We need to measure our code where it MATTERS.
b. In the wild, as it's generating business value.
c. In production. Not on your laptop, not on a test cluster.
Five different ways of measuring our code in production:
Metrics.gauge("cities", {cities.size})
Metrics.counter("connections").inc() // .dec()
Metrics.meter("requests", SECONDS).mark()
Metrics.histogram("response-sizes").update({results.size})
Metrics.timer('requests', MILLISECONDS,SECONDS).time( { FN })
When instrumenting/designing a system, we need to ask ourselves two questions
Rule of thumb: If it can affect the business value of your code - ADD A METRIC AND MEASURE IT.
Collect all that data & store it.
Monitor it
Aggregate it
OODA Loop: Applies to software just as much as fighter pilots, just with less shooting and flames (hopefully)
If we do this faster - we will win.
We might write code
We have to generate business value
In order to know how well we're doing that, we need metrics to understand and gather the data.
Aggregate these values for historical perspective to gain long-term patterns
The map is not the territory, we need to improve our mental model of the system to make better decisions