Story Template: Jennifer from ABC Energy

Photo by Christina @ wocintechchat.com on Unsplash

Jennifer has a goal to scale up energy storage projects across the US. She is the kind of person who takes every step carefully and methodologically. Think of her as an asset manager with the mind of a finance portfolio manager.

This is an hour from her life in 2017 when she was working with a battery storage operator in California.

I got up to the sound of chimes on my phone. There was an issue at one of our storage projects and the site was not able to dispatch energy into the grid.

This was a bit surprising because these system follow automated processes and require little manual intervention. We checked the logs and it seemed like everything was fine until last night. At some point in time the batteries stopped charging but carried on with dispatches. But once the charge level reached a minimum threshold, the system automatically stopped dispatches too.

While that answered the question as to why the systems stopped, it still didn’t answer the question as to what triggered it in the first place? I sat down with the logs of the system to look at every single event from the last night hoping to find a clue.

We needed to really get our fingers on the exact cause so that we don’t let this occur again. I looked at the dispatch signals, the C-rates for dispatches, and voltage levels and meter output but it all seemed to be normal. After about half an hour of fanatic searching (and some coffee) I came across some logs that looked as if there was a warranty issue. Upon further investigation I found the command to stop charging was triggered by a micro instance of a warranty violation.

What I realized is that the warranty system was configured to monitor instances of extreme values but not the duration or the magnitude of violation. A system may momentarily cross a threshold but that is ok as long as it is not sustained or extreme. This was a gap in the way the system was configured.

We fixed the issue and restarted the system and since then we have never faced such issues.

This led me to believe in the importance of manual oversight over automated systems. An algorithm can only do as it is told to, and sometimes it needs to be retold what to do, and that requires us to dig deeper, ask questions and methodologically search for answers to find the root cause.

Since then Jennifer has adopted a root cause analysis culture in her team where incidents are not just fixed but are investigated using:

  • Exception reporting — manually look for exceptions or outliers in the data
  • Evidence based decision making — find and report evidences from the data
  • 5 Why’s — keep asking why until there are no more answers — what is left is the truth

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

My first experience as Data Science Intern at LetsGrowMore

Process of Data Analysis

Graduate Rotational Internship Program

Counting Sort

Counting sort algorithm written in C++

3 Ingredients for Scaling Quality Data Labeling for Machine Learning

Why Big Data With Complex Models Won’t Always Work?

Analyses Is even more essential For Buying AnyItem. https://t.co/BCIYEplSPA

The DataOps Files V: Anomaly Detection

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Indresh Kumar

Indresh Kumar

More from Medium

Exit through the gift shop

我的短篇故事作品:回不去的鄉愁~緬懷那隻因試圖逃離虐待而被射殺的馬戲團大象泰克| Hiraeth, One of My Short Stories Inspired by A Circus…

How Discrimination, Alienation and Abuse taught me about Self-Acceptance — The Ugly Duckling

What is ConcordFi?