Fallen Victim to Glitches? Downtime? You’re Not Alone. Here’s How AI Can Help Your eCommerce Business


At least three things are certain in this world: death, taxes, and glitches.

On Prime Day in 2018, the event kicked off with several hours of glitches which cost Amazon an estimated $1.2 million per minute.

Despite the prevalence of downtime incidents, too often companies assume “it’ll never happen to me”, convinced they’ve taken the necessary steps to avoid outages.

Don’t let denial blind you from the ubiquity of downtime. Nor its impact.

In addition to diminished sales, incidents increase customer churn, negatively impact customer experiences, hurt brand reputation, and cause setbacks in search engine rankings — all of which contribute to the multi-billion-dollar costs of unplanned application downtime each year.

While you may have safeguards in place for the kinds of outages that grab media attention, it’s the smaller glitches and unknown-unknowns that are quietly impacting subsets of your customers.

To minimize both small incidents and widespread downtime, you need a better way to detect and resolve issues — which is why many companies are adopting a new wave of analytics that replaces manual processes with artificial intelligence-driven capabilities.

Two challenges that increase time to detect and resolve

Every second counts when it comes to resolving eCommerce downtime.

But none of this is news to those of you in eCommerce. You most likely already have performance monitoring and analytics in place to spot anomalies across your business. And you rely on these systems to help you minimize time to detection (TTD) and time to resolution (TTR) for all manner of glitches and failures.

However, traditional approaches to performance monitoring and analytics can leave a lot to be desired. There are two key challenges associated with traditional analytics that increase both TTD and TTR.

1. False positives increase time to detection

Traditional eCommerce performance monitoring and analytics strategies depend on manual thresholds to detect glitches, downtime incidents, and other issues. When data samples move beyond these thresholds, an alert prompts you to investigate potential problems.

Because manual thresholds can’t adjust for seasonality, related events, and other contributing factors, this system typically generates a large volume of alerts, many of which end up being false positives. Sifting through the noise is a huge obstacle, even if your company’s blessed with a large team of data analysts.

Consider how false positives create a Cry-Wolf Effect with price glitches. With manual thresholds, something as simple as a discounted item could trigger an alert for anomalous spike in sales. As a result, multiple departments across an eCommerce organization are alerted for no reason, wasting valuable time and causing data leaders to doubt the system’s findings.

As employees continue to receive alerts, they’ll become frustrated and start to ignore them. Then, when there is a real price glitch on your eCommerce site, you could have an incident drag on and cost tens of thousands of dollars.

You experience the Cry-Wolf Effect when manual thresholds aren’t dynamic enough to understand the context of anomalies. It’s not enough to identify that behavior has deviated from business-as-usual — you must be able to trust that alerts indicate real problems in your organization. With such high volumes of transactions and network traffic to analyze, manual thresholds just won’t get the job done anymore.

This would be much less likely to occur with AI analytics, which accounts for related events, such as a sale, and keep affected items out of its alerts.

2. Siloed legacy tools increase time to resolution

For many eCommerce enterprises, core business systems like CRM, web analytics, and social analytics operate as silos. This isn’t just a problem for business agility and cross-departmental alignment — it negatively impacts your ability to resolve glitches, downtime, and other eCommerce issues.

Because of these silos, your data analysts are left to correlate data across those systems manually. You may get multiple alerts that performance has crossed a certain threshold, but TTR depends on your ability to quickly identify the root cause. Without correlating data, you’ll think that disparate alerts indicate unique anomalies when they might actually point to a single root cause.

In eCommerce, these problems can emerge when, for example, application performance and business operations data are handled separates. A glitch on the application performance side will impact the business side and hurt the customer experience—but you can’t get to the root cause of the problem with data islands.

By correlating both application performance management (APM) and customer experience data, you can understand when drops in purchases are the cause of something like an API glitch. Without anomaly detection that can correlate data across all business systems, these kinds of problems can persist for hours or even days, costing an eCommerce business hundreds of thousands of dollars in the process.

Similar problems can occur on the backend of an eCommerce business, too. Even slight anomalies in logistics processes can result in significant revenue losses. But without transparency, you might not know how to address minor differences in shipping and delivery estimates that could drastically impact revenue. Understanding the anomalies in shipping metrics and correlating that information with front-end transactional data helps you identify the root causes of logistics inefficiencies and resolve them quickly to improve your bottom line.

Data correlation is especially important for cross-department communication. It eliminates any finger-pointing in the wake of an issue by helping surface the root cause. That way, the right department and stakeholders have a sense of ownership over the incident and can cooperate towards its resolution.

eCommerce Companies that Replaced Manual Monitoring With AI Analytics Significantly Reduced Downtime

Many of the problems eCommerce companies have with TTD and TTR come down to one root cause — the way they monitor data.

When your data team only had to manage a handful of metrics, traditional monitoring tools may have seemed like enough. But now you have to track the number of sessions, total sales, number of transactions, competitor pricing, clicks by search query, cart abandonment rate, total cart value, and the list goes on and on. Traditional analytics tools can’t intelligently analyze millions of metrics, learn their fluctuating behavior, and identify anomalies — that requires AI and a dedicated anomaly detection solution.

Which is why AI analytics, with a dedicated anomaly detection capability, is a necessity (not a luxury) for modern eCommerce companies. When you replace manual thresholds with machine learning algorithms, you’re able to scale to millions of metrics effortlessly and generate real-time alerts for the anomalies that cause downtime incidents. AI anomaly detection solutions reduce TTD and TTR by:

  • Collecting time-series data from every source across your business, sifting through millions (even billions) of metrics in the process
  • Identifying relationships between data patterns across disparate systems and functions
  • Automatically pinpointing specific events, related anomalies, and other factors associated with anomalies and alerting you in real time

As eCommerce continues to become data-driven, every second counts — and your business can’t afford prolonged downtime that hurts revenue and your brand. Take advantage of AI analytics with a dedicated anomaly detection solution to minimize downtime and make sure your company stays out of the headlines.

David Drai is CEO and a co-founder of Anodot, where he is committed to helping data-driven companies illuminate business blind spots with AI analytics. Previously, he was CTO at Gett, an app-based transportation service used in hundreds of cities worldwide. He also co-founded Cotendo, a content delivery network and site acceleration services provider that was acquired by Akamai Technologies, where he served as CTO. He graduated from the Technion - Israel Institute of Technology with a B.Sc. in computer science.