Jump to content
FORUMS
Sign in to follow this  
Starym

The Second War of the Shifting Sands Behind the Scenes: Engineer's Workshop

Recommended Posts

51192-blizzard-congratulates-new-scarab-
 

Blizzard are taking an in-depth look at how the second Ahn'Qiraj gate opening series of events was created, from a technical standpoint. They delve into a bit of history and the first, Vanilla opening and what was learned from that, how they used automated players and stress tests to get the second version working as well as possible, the limitations of the Classic/original code itself, the very first openings of the gates and how GMs were following them live and implementing solutions on the fly, and a whole lot more!

If you have even a slight and passing interest in how things are actually done behind the scenes, it's a fascinating look at one of the biggest events ever in WoW history and how it was improved and handled.

    Blizzard LogoAQ (source)

     Join us for a behind-the-scenes deep dive on recreating one of World of Warcraft’s most iconic events, the Ahn’Qiraj war effort.

     

    War is upon us. Earlier this month, one of the most anticipated events of World of Warcraft: Classic went live—the Ahn’Qiraj war effort. Entire Classic realms—the might of the Horde and Alliance combined—came together, contributing resources to open the gates and unlock the Ahn’Qiraj raids. When the War of the Shifting Sands took place the first (and only) time in 2006, thousands of players from each realm flew or hoofed it over to Silithus to partake in or witness the chaos. The turnout was beyond the development team’s wildest imaginings and, simply put, we were not prepared. Servers quickly become overloaded, and many players were caught in a loop of logging in, disconnecting, and trying to get back online over a 12-hour period while our engineers scrambled to hotfix issues and get players reconnected. While we did manage to stabilize servers during the event, and learned quite a few lessons, we saw opportunities to do better. Fifteen years later, we were ready to recreate one of the most epic moments in WoW history for WoW Classic by focusing on server optimization to combat lag and eliminate server crashes, all while hosting up to twice as many players in Silithus than we did during the event’s debut in 2006.

    In this article, we’ll walk you through how we were able to recreate this highly anticipated event by going over how we use automated players and stress tests to determine breakpoints and handcraft optimization solutions, how we came up with solutions in the software to solve problems that hardware couldn’t, and how we curated a global event with limited server crashes, all while preserving the WoW Classic gameplay experience.

    Recreating the Second War of the Shifting Sands

    We had three specific goals in mind when approaching how we would need to engineer this event: Prevent chain crashes, increase the expected zone player limits, and determine how much lag was tolerable before porting players outside of Silithus. Before we can get into the nuts and bolts of how we maximized server performance, it’s important to understand the constraints we’re working in: the limitations of WoW Classic’s codebase, how population management solutions work, and how they affect gameplay.

    Anubsiaths Invade Azeroth

    Beyond Boundaries

    The modern version of World of Warcraft was built upon the foundation of the original codebase released 15 years ago. Since the game’s launch, we’ve developed more modern ways to handle high player counts within Battle for Azeroth, most notably sharding. Shards allow WoW servers to host many more players in-game than we were capable of in 2006. In Battle for Azeroth, we use them to manage servers’ player load by making a copy of a zone (e.g. Zuldazar) once the player count reaches a certain threshold. This neutralizes lag issues by spreading players across different versions of the zone, since player interactions are among the most CPU intensive due to the amount of packets that they constantly send to the server for pinpoint accuracy on their movements and spells casts. Additionally, sharding mitigates potential lag issues that can be encountered when transitioning into a new zone where the player count goes over the threshold. Sounds simple enough, except there’s a catch—WoW Classic has been engineered to be a faithful recreation of the original 1.12 game data, which includes preserving its gameplay quirks. In rare cases, shards will cause your quarry, such as an enemy player or NPC, to disappear when phasing into a new zone. Keeping shards in would mean losing some of those nostalgic gameplay moments of chasing players and NPCs across zone boundaries. So, now we needed to come up with a solution that didn’t interfere with the original gameplay while also allowing us to get more players onto the server without forcing players to suffer through unplayable lag.

    To handle this issue, we elected to use layers—copies of entire regions (e.g. Eastern Kingdoms)—to manage player population and lag issues while keeping the memorable charm of the original release intact so players could once again kite world bosses across zones and chase enemy players across borders within a region without the risk of them being reassigned to a different shard. However, layers were designed as a non-permanent solution. Because the original 1.12 release did not use either sharding or layering technologies, we promised players that we only use layers at the launch of WoW Classic and phase them out over time as they dispersed more evenly throughout the world. There are a few cases in which we still use layering due to incredibly high populations of active players (e.g. North America’s Faerilina), but we have reduced the number of layers active on these realms since the game’s release. With 15 years of buildup, the AQ war is among the most highly anticipated events of WoW Classic, and we expect it to have the most amount of players in one area, outside of starting areas at the game’s release, without layers to manage it. Without layers or sharding population tech, we had to get creative, and quickly.

    Players Gather around the Gong

    Handcrafting an Unforgettable Experience

    We started the undertaking of finding a non-layer and shard population solution by generating headless clients—automated players—and instructing them to mimic what real players might do, such as casting spells, fighting NPCs, and moving around the area. This allowed us to take a snapshot of what performance could look like with thousands of players interacting in a single zone. After running these simulations, we then organized stress tests with volunteers so we could capture realistic player behavior and see how they compared. This gave us an indication of certain breakpoints and which pieces of our server’s code were experiencing the most issues at high player counts. Server frame time measurements were heavily scrutinized to see how close they were to causing a server to become unresponsive, also known as deadlocking.

    The next step was to analyze what was affecting server performance so we could begin breaking down this monumental task into comprehensible goals. What we faced is a polynomial problem, which means we can’t solve it by throwing faster hardware at it because hardware’s not exponentially better. Instead, we have to handcraft the optimization by deliberately choosing which data should be communicated to players and how often. To illustrate this conundrum, let’s say we have 20 players jumping in a circle. The server relays the actions of each player to the other 19 through packets (data deliverables). In this group of 20, the server processes 380 packets (20 total players * 19 recipients = 380 packets). This issue compounds when more players do the same action in the zone. If we increase our example to 500 players, then 249,500 packets are sent from the server. If we increase our example again to 1,500 players then 2,248,500 packets are sent to the server. Depending on player actions, multiple packets are sent per second—keep in mind the above examples only account for one action. The more packets sent to the server increases the processing time the server must take on a single player while then going on to handle every other players’ actions. When this problem compounds, the servers begin to approach deadlocking. In WoW Classic, we have significantly more players per realm than realms did back in 2006, so the expectation is that we accommodate more players around the gates than we ever did before.

    Optimizing Server Performance

    Our servers are engineered to crash and restart if they encounter a deadlock, so we knew it was critical to do everything in our power to help minimize processing time. After some testing, it became clear that movement was the first piece of processing power that was putting heavy stress on our servers. We began by dropping facing updates (displaying the direction a character model is facing) and only send out player updates whenever a player starts, stops, or uses keyboard movement. Since latency with an excessive amount of players is already compromised, spending CPU time sending minor facing updates made the fidelity worse. As such, it was better to stop sending them. We made the decision to cull how often we sent movement updates in favor of having more players in a zone. Keep in mind we’re trying to find the breaking point before the servers fall over while allowing as many players into Silithus as possible. After all, it’s better to miss some movement updates than to not be able to login to your character at all. We also started throttling data that was marked as lower priority. Doing something that is deemed a “less important” action should not be sent with the same rate as “more important” actions. We saw many messages all being sent at once regardless of how important they were and optimized the code to only send you less important information in batches and less frequently.

    Buffs and debuffs were another large hit on our performance. Throughout the world, especially when fighting mobs, buffs and debuffs are applied to units all the time. Though this may not seem like a big deal, with a high concentration of players all around each other, this information needs to be passed around. Similar to throttling low priority data, we now batch the buffs and debuffs to avoid sending multiple packets in succession to players.

    Managing Player Populations

    Aside from optimizing the servers to handle more players in each zone, it didn’t escape us that it’d be impossible to fit an entire realm’s population, (more than double what the original 1.12 WoW realm could handle) all within Silithus. Hard decisions had to be made to limit access into the zone by controlling who we allowed in and how many players we could allow in. We decided that we would only allow level 60 characters inside Silithus and would stop allowing eligible characters inside if was full. Creating this restriction was the right choice to make since the event in Silithus is known to be end-game content, and lower-level characters can still participate in the war effort in other zones, such as slaying the anubisaths that roam in The Barrens intended for level 20 to level 30 players. The second sticking point was that we knew the upper bound for how many players in an area we could handle without crashing the server; the question then became what that number should be reduced to for the best performance to player ratio. Over testing, we found this number to be around 1,500 players if they were stacked on top of each other. However, since the even takes over the whole zone, we saw minimal performance problems once players spread out.

    The event was planned to take place in all regions, so we had to make sure this event worked across multiple layers. This means that a Scepter-bearer who rung the gong on one layer should begin the event across all other layers connected to that realm. Since the trigger for the event was based on a player interaction, we wanted to ensure the Scepter-bearer was visible across multiple layers so all players on the same realm could see them. This created an interesting problem since servers now had to relay this information that they typically wouldn’t need to communicate to each other. This can create a lot of complications as we compile and send updates through the servers to make sure we mirror the data across multiple layers, potentially to thousands of players.

    We began developing this tech with the introduction of the Stranglethorn Fishing Tournament and applied it to the Onyxia, Nefarian, Zul’garub, and Rend world buffs later. Once we felt it worked as intended, we were ready to test it along with our other tech for the AQ war event.

    Horde Players in Silithus

    Experimenting with Solutions

    Now that we had addressed major tech hurdles and implemented several ways to optimize server performance, it was time to test everything we had worked on. We created a shortened version of the 10-hour war, scaled down to only run for an hour.

    During the first stress test, we let nearly all players into the zone to see what would happen. At one point, we were nearly at 150% the capacity of an entire 1.12 realm. This was when we saw our test realm crash. We knew we had put a very high number on how many people we’d cap the zone to, and we were seeing numbers that had exceeded that number greatly. We investigated the issue and realized that the code allowing players to transfer both into a zone and out of zone was a queue that didn’t process many players at once. This was why players weren’t being ported out and why players were stuck on flight paths for an unusually long time. We restored the server and continued the stress test, adjusting as we went. We slowly lowered the number to a point where we felt it was still laggy, somewhat playable, and retained a much higher number of players than any zone had seen before. The event that was supposed to only take an hour and a half ended up taking up to four hours to complete because of crashes.

    The second stress test was performed a week later. This allowed us to see if our optimizations worked. Upon loading into the stress test, we immediately noticed improvements—players were no longer stuck on flight paths leading into Silithus! We were able to obtain enough data that demonstrated how many players we could comfortably have in Silithus. After both tests, we moved forward with numbers that we felt accounted for the best balance between managing lag and server stability. These tests allowed us to see if our optimizations worked, and consider both tests successful since they allowed us to identify zone caps and iterate on them.

    Spreading Server Solutions Across Azeroth

    Originally, the optimizations were planned to only be active for Silithus during the War of the Sands. After we determined they’d be safe to rollout globally, we applied them to the entire world in 1.13.5. Once the war effort started, players began turning in supplies and harvesting bug corpses en masse. We saw a massive spike of players not only in Silithus, but also in our capital cities and outside zones. These optimizations helped make these experiences more performant, allowing large-scale PvP battles to take place across Azeroth. Some players even went as far as spawning the world boss Thunderaan to help clear out the other faction from a Hive.

    Even though the gate opening event hadn’t taken place yet, some servers were experiencing strange issues regarding their war effort not progressing. The rate at which some servers were completing their war effort was so fast that it would cause there to be a race condition in the logic of each turn-in that could prevent the five-day timer from starting. Since the chance of this edge case happening was so small, we were able to fix those servers manually and then address this issue for future realms completing their effort.

    Once the war efforts had been completed and five days had passed to open the gates, we began monitoring the Chinese realms that were first to open in the world. The first server in China to have an active Gong was Ouro. As we monitored our layer populations, we saw that most players on each layer were in Silithus. The event going off across multiple maxed-out layers for several thousand players at once was something we’d never done before. Though there was apparent lag, our servers didn’t experience any crashes during the first set of China realms opening.

    Bang a Gong!

    On August 4, it was noted that there would be several realms in North America ready to hit their gongs shortly after servers came up from reset. One by one, we actively monitored these realms on Game Master accounts and through our observation tools to monitor and address any issues that might be encountered. Each realm opened and began the event without issue. Scepter-bearers received their prestigious Black Qiraji Battle Tank mounts, players got to fight even bigger bugs, and we were pleased with the stability. As we were waiting for our first post-reset server to complete its five-day wait period, we noticed a significant issue: Events weren’t persisting after server restarts. This means that if a server would crash or restart, we would lose all progression in the event. Though this problem had existed since the beginning of WoW Classic’s development, there hadn’t been many applications of the use of events persisting across server restarts. Our team was able to address the problem quickly, but we needed to ensure that no further restarts could happen until we were able to deploy a fix and properly catalog all existing status of war efforts into our database without interruption to players.


    Some may argue that allowing servers to crash is what made the original AQ war chaotic, which in turn made it memorable. Instead, we strove to cultivate that same fervor by curating a much more stable experience that could be shared with around 1,500 players in Silithus at the same time on each server. We wanted the memories of the Classic AQ war to be of having as many players as possible play through the 10-hour event without interruption. While we did experience a few realms crashing, we were able to get them back online quickly. These realms fully recovered and were back online within minutes and no subsequent crashes took place.

    Over 4,000 players worldwide have become Scarab Lords, and that number continues to climb as each server progresses their war efforts. The excitement and engagement on Classic since the AQ war effort began has been incredible to watch and we’re grateful to all who joined us for the second War of the Shifting Sands!

     

    Share this post


    Link to post
    Share on other sites

    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Reply to this topic...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.

    Loading...
    Sign in to follow this  

    • Recently Browsing   0 members

      No registered users viewing this page.

    • Similar Content

      • By Staff
        We have some details on difficulty tuning coming with this week's maintenance in Season of Discovery, with specific changes coming to individual Heat levels and bosses!
        Molten Core (Source)
        We’re working on a series of hotfixes to adjust Molten Core, and we don’t expect to get these changes implemented until some hours have passed after weekly maintenance tomorrow morning.
        Ragnaros no longer submerges twice on Heat 1. Baron Geddon now casts Armageddon at 5% on Heat 1 (was 10%). Baron Geddon now casts 2 Living Bombs at a time on Heat 2 (was 3 casts). Living Bomb no longer leaves Living Fallout pools on Heat 1. Flamewaker Protectors now cast Dominante Mind much less frequently. We’re maximized the cast and recast times of several dispel mechanics. We’ll let you know when these changes go live tomorrow afternoon.
      • By Staff
        Blizzard have announced there was an issue with non-boss mobs in Molten Core not having the correct immunities in the first few minutes of the raid's launch. A hotfix has been deployed, but players that entered the raid in that period can also resolve the issue by soft resetting the instance or switching Hear level. 
        Molten Core (Source)
        About 5 minutes after Molten Core opened, we deployed a hotfix to correct immunities on non-boss enemies. Players who zoned in during the first few minutes can expect to find boss adds that have unexpected immunities.
        If you wish to, a soft reset of the instance should result in the non-boss enemies spawning correctly. Also, quickly swapping to a different Heat level and back to your original Heat level at the Hydraxian NPC near the entrance will reset the boss add immunities without needing an instance reset.
        Thank you!
      • By Staff
        The new Molten Core and Onyxia's Lair are now live in SoD Phase 4! 
        (Source)
        It's time to put on your fire resist gear to face Onyxia in her lair and Ragnaros, the fire lord, in the Molten Core.
        World Bosses Now Live
        Experience Lord Kazzak and Azuregos as brand-new instanced versions of these encounters. These encounters can be experienced with up to 40 players but will be tuned as 20-player content, making them both pick-up-group-friendly with 40 players, but also giving players the option to maximize their rewards by attempting them with 20-players.
        Onyxia's Lair and Molten Core Now Live

        Experience Molten core as a 20-player dungeon with 10 classic bosses (+1,) revamped loot, redesigned Tier 1 sets, and more. Players will also be able to turn up the heat with a new variable difficulty mechanic allowing players to earn more rewards from undertaking a higher difficulty. Don’t forget your fire resist gear! You are going to need it. We’ll be adding one more surprise to this classic raid dungeon for intrepid adventurers to discover.
        Onyxia will also be adjusted to accommodate 40 or fewer players along with some additional loot adjustments.
        New Raid Lockout interval
        Raid lockouts for Season of Discovery Phase 4 raids will occur regularly once a week for Molten Core and two times a week on static reset days for Onyxia’s Lair, Lord Kazzak, and Azuregos. Players in North America will have their resets for these bosses on Tuesdays and Saturdays, in Europe, on Wednesdays and Sundays, and in Asia on Thursdays and Mondays.
        Gather your allies and go forth to battle!
      • By Staff
        Blizzard will be increasing the drop rate for Librams of Resilience in BRD and BRS, as well as Burning Essence rates from the Dark Coffer with a hotfix today. They also explained some of the fire resistance numbers found in today's Molten Core post, pointing out that those listed were fully buffed.
        (Source)
        We agree, actually. We are sending a hotfix very soon to add a chance for Librams of Resilence to drop from certain Blackrock Depths and Blackrock Spire bosses, and increase the liklihood you get Burning Essence from the Dark Coffer in Blackrock Depths. This should help make these items a bit more available, but still require the same process and ceremony to complete the arcanum.
        I want to also take this opportunity to also clarify that the fire resistance numbers listed in the blog post are fully buffed. So that ~100 fire resistance breakpoint for Blistering/Heat 2 only requires you to have ~40 unbuffed if you have a shaman or paladin in your group. For many classes that is a single item and the new +20 Fire Resist cloak enchant from Hydraxian Waterlords reputation. For Molten/Heat 3 you’ll need to do a bit more prep and that is intended, but as alluded in my previous post, Heat 2 should be very approachable for almost anyone.
      • By Staff
        Blizzard have detailed the changes coming to Molten Core in Phase 4 of Season of Discovery, from the new Heat mechanic to the new boss behind it (with no unique loot)! The raid opens tomorrow, July 25th, at 1 PM PDT/10 PM CET! 
        Molten Core (Source)
        The temperature rises with the release of Molten Core on 25 July, at 21:00 BST. Season of Discovery is presented with a unique challenge, as a new “Heat” mechanic is introduced in the Molten Core, which is not to be taken lightly.
        When you first zone into Molten Core, speak with a Hydraxian Waterlords NPC at the entrance to choose one of three different Heat levels—Sweltering, Blistering, or Molten—with an increase in difficulty at each level.
        If things get too hot to handle, you can swap Heat levels anytime. Simply speak with the NPC at the entrance to the Molten Core while not in combat. You will also have another opportunity to swap Heat level again with NPCs that spawn after using Aqual Quintessence to douse boss runes.
        Sweltering Heat – Level 1 The easiest of the three modes is designed to be approachable to almost all groups. Sweltering Heat is for those visiting the Molten Core who are less prepared to handle the Heat—fewer consumables, fewer fire resist buffs, etc. Blistering Heat – Level 2 A normal mode that requires more coordination and basic preparation. For those who have a moderate amount of consumables, enchants, buffs, nominal fire resistance, etc., Blistering Heat should be a breeze. Gear rewards for Blistering Heat are the same as Sweltering Heat, with an additional drop for most bosses. Recommended Fire Resistance: +96 or higher Molten Heat – Level 3 The Heat turns up in Molten Heat for skilled groups prepared to face the fire. The rewards for Molten Heat are the same as Blistering Heat, with a few additional items available from later in the Raid, along with unique weapon cosmetics. Recommended Fire Resistance: +226 or higher Can You Handle the Heat?
        With each heat level comes increased difficulty with additional mechanics, damage, and an increased requirement for “fire resistance.” In original World of Warcraft, Resistance was often an inconsistent statistic. Sometimes, you would fully resist all damage; sometimes, you would resist none, even with the maximum fire resistance being worn. Resistances still work this way; however, once you get to Blistering Heat or Molten Heat, those difficulties will require a certain amount of fire resistance to progress consistently, and the required amount will increase between Blistering and Molten Heat levels. This is an additional layer to the usual “have some resist gear and hope it saves your healers some mana” mechanic from the original WoW. Some mechanics could leave you burnt to a crisp if you don't meet the required Fire Resistance requirements.
        We point this out because the need for resistance and resistance fights has long been contentious in World of Warcraft. While it’s an iconic part of WoW, it’s easy to see in hindsight that it’s not always a great feeling to be incentivized to NOT wear your best gear for appropriate fights. We recognize this, and that’s part of why we felt it important to explain the intended design of this mechanic. One of the most fun and iconic things about Classic is preparation (or, at times, over-preparation) for challenges, and the base game heavily rewards players for that preparation. We want to use fire resistance to play up that aspect of raid preparation and promote healthy and satisfying gear progression as you gear up within Molten Core.
        When you first enter the Molten Core and want to tackle Blistering or Molten Heat difficulty, you may need to wear sub-optimal pieces of gear simply to survive. Fairly quickly, however, more and more gear will drop in the raid that is optimal for you to be wearing, including tier sets, which you will notice all already have a certain amount of fire resistance. The idea here is that as you obtain items you want to wear, they convey much of the same benefit as the less powerful pieces of fire resistance you may have farmed or crafted before your first trips into Molten Core. The eventual goal of this gear progression is to allow you to succeed on the highest difficulty level primarily by using your tier set, appropriate group or raid buffs and consumables, and good group coordination.
        A New Boss Heats Things Up
        Lastly, the Molten Core raid will have an additional not-yet-known bonus boss. When Molten Core releases, this bonus boss will only be available for Molten Heat, requiring players to clear every boss before gaining access. This boss is an optional challenge that groups can work towards over time.
        We do not wish to gate exclusive rewards behind this challenging content. Because this boss will be tricky and require progression through the rest of Molten Core on the highest difficulty, the boss will NOT drop unique or exclusive loot. Instead, he will drop tier gear tokens and a chance at a few other exciting items, but all gear can be obtained elsewhere in the Molten Core.
        We hope to see a wide variety of groups attempt Molten Heat difficulty, have a satisfying progression, and not feel like they are missing out on exclusive power-impacting gear if they aren’t able to clear every boss on Molten Heat each week. Like many things in Season of Discovery, this is an experiment—we hope to use lessons we learn from this boss and the Heat system and apply those lessons to future Classic Raid tiers and potential future content.
        We hope this helps set the groundwork for what to expect going into the Molten Core and also helps illustrate our intentions for resistance fights and Raid difficulty with this raid tier. We want to emulate the feeling of gear progression from original WoW as much as possible, with the added benefit of more powerful, bespoke gear for a wider variety of playstyles within each class.
        We look forward to seeing you beat the heat in Molten Core and Onyxia!
    ×
    ×
    • Create New...