On Sunday October 18, during ESPN’s Monday Night Football, Disney and Lucasfilm announced advance ticket sales following the debut of a new trailer for Star Wars Episode VII: the Force Awakens. The two-and-a-half minute trailer immediately attracted more than 17,000 tweets per-minute, while being viewed more than 220,000 times within the first 20 minutes on YouTube, peaking to more than 9 million views, 12 hours later.
ESPN’s flagship channel, SportsCenter, took advantage of this jock/geek mash up and tweeted the trailer to its 21 million followers. It should be noted that Disney, who purchased Lucasfilm in 2012 also owns ESPN, which makes the media hype cross-pollination all the more logical, leveraging their ginormous though somewhat captive audience for promotion.
Legions of fans, positioning for the sweet-spot in theaters (some offer seat reservations) and first-viewing bragging rights collectively hit the “confirm purchase” buttons across hundreds of sites, all about the same time. Replicating concurrent user actions inundating requests from the server is a hallmark of performance testing on your network. Clearly, the body count was underestimated by the powers that be, with servers collapsing under the sheer weight of the fandom.
Try as they might to “Use the Force,” fans were met with “site maintenance” and “web server is returning an unknown error” messages en masse. You can bet the phrase “Help me Obi-Wan Kenobi, you’re my only hope” was being muttered by frustrated Jedi wannabes who were collectively losing their galactic minds.
So, What Went Wrong?
Even though online ticket sellers reportedly “prepared” for the imminent release, many were blind-sided by the crashes. For network managers and administrators preparing for this event, human behavior – in the shape of passionate Star Wars fans, salivating over the imminent holiday release of the newest film in the popular franchise – should hardly have been a surprise. What was a surprise, were failures inherent within the ticketing infrastructure that buckled under the pressure of the concurrent requests.
Cinemas did anticipate a rush and some prepared for it well in advance. Alamo Drafthouse reportedly prepared for this day for nine months. Their CEO, Tim League laments: “We spun up 40 simultaneous servers and were monitoring the load to instantaneously add more if needed. We hosted our static pages in a state-of-the-art cloud environment that could also instantaneously expand with demand. The massive onslaught of simultaneous users, however, exposed an unforeseen flaw in the ticketing infrastructure itself that we were unable to fix on the fly.”
Even Fandango – a veteran of online sales with 15 years of experience – suffered similar setbacks, seeing traffic surges up to 7 times its typical peak levels. Once operations were restored, they reported record ticket sales, with 8 times as many tickets compared to the first day of sales for 2012’s The Hunger Games, their previous record holder which sold $155 million during it’s opening weekend. Taking a page from the Star Wars playbook, theatre chain Odeon injected some levity by tweeting an apology: “Fleet Commander: The ODEON galaxy is experiencing heavy traffic but is now returning to normal force levels. Thanks for your patience.”
The Importance Of Thorough Testing
Well-established stress testing procedures can determine if your site can handle unusually high traffic spikes, while ramp tests reveal how much traffic your web server can handle before performance slows to a crawl. Basic load tests help determine web server performance under expected load demands such as shopping cart functionality. However traditional testing may not accurately determine what can happen under extreme load conditions – as was the case with the environments hosting the ticket-selling services.
With many of today’s web applications hosted on scalable on-demand platforms, as load and access increases, new servers and services automatically spin up to accommodate the added user influx. Yet, if these new services do not come up, or do not happen effectively, network and application latency actually increases. This can have catastrophic effects on application scale, performance, and user access.
Yet, these types of network failures are just what keeps my job interesting. I welcome these unexpected outages with the verve and vigour of a Rebel Alliance smackdown, rising to the challenge. Working for a company that creates network stress test equipment and solutions requires a taste for battling forces seemingly beyond one’s control. Plus, if networks were predictable, I’d be out of a job. And, what experience provides is not the ability to logically predict network behaviour, but rather instinctively where to anticipate problems, and where trouble spots might arise. With this principle in place, the real testing can begin.
Farce To Force: A Strategy For Testing Times
The secret of today’s most advanced test tools is that they not only can model your network accurately in the laboratory, but can also subject the model to highly realistic operating stresses. It is not enough to chuck a load of packets into the system to see if it can carry them, because surprising things can happen as a result of traffic patterns.
The traffic pattern from a voice exchange is very different from that of an e-mail exchange, or a video stream. Interactions between these patterns superimposed on a network can generate unexpected stress – so a valid test must be able to mix and match truly realistic traffic scenarios and then exaggerate them to also see what happens under traffic peaks such those experienced in the Star Wars ticket gaffe.
Then there is another factor: what happens when a cyber attack is added to the mix? A good test solution will connect to a cloud database that includes the very latest attack patterns that can be applied to the model under test. The ability to realistically emulate millions of users accessing a web application at high rates, means that administrators can predict from their own immediate experience what will happen to the network and the application under many diverse load conditions.
You can use that knowledge to make changes to the network model, tweaking its weak points then re-testing to see the improvement and deciding whether it is worth upgrading the actual network. Or you can simply make a note of the system’s limits: forewarned is fore-armed, and your crisis management strategy will be built on foreknowledge. Emulation provides the valuable intelligence that supports predictable success, whether in ticket sales, managing attacks, right-sizing a network to maximise ROI and minimise TCO, or just ensure that everything continues to run as expected.
Yoda once said, “Difficult to see. Always in motion is the future”. So shouldn’t your test strategy be always in motion? Test with the times or – even better – get ahead of the times!