Jump to content
FORUMS
Sign in to follow this  
Staff

An Engineering Update on the Dragonflight Launch

Recommended Posts

51634-engineers-workshop-developing-for-

The World of Warcraft Engineering Team has posted an update on the Dragonflight launch.

Blizzard Logo(Source)

With Dragonflight’s recent launch behind us, we want to take some time to talk with you more about what occurred these past few days from an engineering viewpoint. We hope that this will provide a bit more insight on what it takes to make a global launch like this happen, what can go right, what hiccups can occur along the way, and how we manage them.

Internally, we call events like last Monday “content launch,” because launching an expansion is a process, not one day. Far from being a static game running the same way it did eighteen years ago—or even two years ago—World of Warcraft is in constant change and growth, and our deployment processes change as well.

Expansions now consist of several smaller launches: the code first goes live running the old content, then pre-launch events and new systems turn on, and finally, on content launch day, new areas, quests, and dungeons. Each stage changes different things so we can find and fix problems. But in any large, complex system, the unexpected can still occur.

One change with this expansion was that the content launch was triggered using a timed event —multiple changes to the game can be triggered to all happen at a particular time. Manually making these changes carries the risk of human error, or an internal or external tool outage. Using a timed event helps to mitigate these risks.

Another change in Dragonflight: greatly enhanced support for encrypting game data records. Encrypted records allow us to send out our client with the data that the game needs to show cutscenes, share voice lines, or unlock quests, but keep that data from being mined before players get to experience them in-game. We know the community loves WoW, and when you’re hungry to experience any morsel, it’s hard to not spoil yourself before the main course. Encrypted records allow us to take critical story beats and hide them from players until the right time to reveal them.

We now know that the lag and instability we saw last week was caused by the way these two systems interacted. The result was: they forced the simulation server (that moves your characters around the world and performs their spells and abilities) to recalculate which records should be hidden more than one hundred times a second, per simulation. As a great deal of CPU power was spent doing these calculations, the simulations became bogged down, and requests from other services to those simulation servers backed up. Players see this as lag and error messages like “World Server Down”.

As we discovered, records encrypted until a timed event unlocked them exposed a small logic error in the code: a misplaced line of code signaled to the server that it needed to recalculate which records to hide, even though nothing had changed.

Here’s some insight on how that investigation occurred. First, the clock struck 3:00 p.m. PST. We know from testing that the Horde boat arrives first, and the Alliance boat arrives next. Many of us are logged in to the game on our characters sitting on the docks in both locations in one computer window, watching logs or graphs or dashboards in other windows. We’re also on a conference call with colleagues from our support teams from all over Blizzard.

Before launch, we’ve created contingency plans for situations we’re worried about as a result of our testing. For example, for this launch, our designers created portals that players could use to get to the Dragon Isles in case the boats failed to work.

At 3:02 p.m. the Horde boat arrives on schedule. Hooray! Players pile on, including some Blizzard employees. Other employees wait (they want to be test cases in case we must turn on portals.) The players on the boats sail off, and while some do arrive on the Dragon Isles, many more are disconnected or get stuck.

Immediately we start searching logs and dashboards. There are some players on the Dragon Isles map, but not many. Colleagues having issues report their character names and realms as specific examples. Others start reporting spikes in CPU load and on our NFS (Network File Storage) that our servers use. Still others are watching in-game, reporting what they see.

Now that we’ve seen the Horde boats, we start watching for the Alliance boats to arrive. Most of them don’t, and most of the Horde boats do not return.

A picture emerges: the boats are stuck, and Dragon Isles servers are taking much longer to spin up than expected. Here’s where we really dig in and start to problem solve.

Boats have been a problem in the past, so we turn on portals while we continue investigating. Our NFS is clearly overloaded. There’s a large network queue on the service responsible for coordinating the simulation servers, making it think simulations aren’t starting, so it launches more and starts to overwhelm our hardware. Soon we discover that adding the portals has made the overload worse, because players can click the portals as many times as they want, so we turn the portals off.

As the problems persist, we work on tackling the increased load to get as many players in to play as possible, but the service is not acting like it did in pre-launch tests. We continue to problem-solve the issue and discount things we know aren’t the issue based on those tests.

Despite the lateness in the day, many continue to work while others take off to get rest so they can return early the following day to get a fresh start and relieve those who will work overnight.

By Tuesday morning, we have a better understanding of things. We know we’re sending more messages to clients about quests than usual, although later discoveries will reveal this isn’t causing problems. A new file storage API we’re using is hitting our file storage harder than usual. Some new code added for quest givers to beckon players seems slower than it should be. The service is taking a very long time to send clients all the data changes made in hotfixes. Reports are coming in that the players who have gotten to the Dragon Isles playing have started experiencing extreme lag.

Mid-Tuesday morning a coincidence happens: digging deep into the new beckon code we find hooks for the new encryption system. We start looking at the question from the other side —could the encryption system being slow explain these and other issues we’re seeing? As it turns out, yes it can. The encryption system being slow explains the hotfix problem, the file storage problem, and the lag players are experiencing. With the source identified, the author of the relevant part of the system was able to identify the error and make the needed correction.

Pushing a fix to code used across so many services isn’t like flipping a switch, and new binaries must be pushed out and turned on. We must slowly move players from the old simulations to new ones for the correction to be picked up. In fact, at one point we try to move players too quickly and cause another part of the service to suffer. Some of the affected binaries cannot be corrected without a service restart, which we delay until the fewest players are online to not disrupt players who were in the game. By Wednesday, the fix was completely out and service stability dramatically improved.

While it took some effort to identify the issue and get it fixed, our team was incredibly vigilant in investigating the issue and getting it corrected as quickly as possible. Good software engineering isn’t about never making mistakes; it’s about minimizing the chances of making them, finding them quickly when they happen, having the tools to get in the fixes right away…

…and having an amazing team to come together to make it all happen.


—The World of Warcraft Engineering Team

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Starym
      We have another triple entry in the hotfix ledger, as Radiant Echoes gets more improvements in retail, while Season of Discovery and Cata Classic get additional class changes.
       August 7  (Source)
      Player-characters
      Steady Flight should no longer be removed after entering an Arena. Quests
      We tuned up the Prototype Shredder MK-03 so that “Eye for an Eye” can be completed. Radiant Echoes Event
      Increased Flightstone and upgrade Crest drop rates in the event. Reduced the HP scaling on all event bosses so that they should be killable in a more-reasonable timeframe. Developers’ notes: This includes both ‘minibosses’ (e.g. Hogger, Thorim) and final bosses (e.g. Remembered Onyxia, Ragnaros). Season of Discovery
      Hunter Heart of the Lion once again has a 100 yard range. Warrior The Focused Rage rune will now correctly reduce the cost of Meathook by 3. Cataclysm Classic
      Fixed an issue where Faerie Fire did not deal intended amounts of threat when used on NPCs targeting another unit.
    • By Stan
      Due to a bug introduced with the War Within pre-patch, some players are receiving item level 250 gear from the weekly cache.
      We've seen numerous reports on Reddit and the official forums that the Last Hurrah weekly quest on live servers drop low-level gear for some players. Apparently, the bug was first introduced with the War Within pre-patch two weeks ago and still hasn't been fixed.
      Here's an example of a low item level drop from the Cache of Awakened Treasures by Omnifox.

    • By Starym
      Week 2 brings quite a few changes, as Hunters in particular rise up, while Shadow has a really bad time. The top 3 remains the same and very consistent, so let's jump in and see what's going on.
      Warcraft Logs Points
      The below logs are based on POINTS, and not actual damage or healing, meaning they log the timed completion for the specs, with higher keys getting more points, obviously. The time in which the dungeon is completed is also a factor, but a much, much smaller one, as it grants very few points if you do it significantly faster than just any in-time completion. We're also using the Normalized Aggregate Scores numbers, for clarity, meaning the top spec is marked as 100 and then the rest are ranked in relation to that peak point.
      All Keys
      95th percentile DPS
      The top 3 remains quite stable with the Evoker-Paladin-Warrior trio reigning supreme. We see the first change of the week right after that though, as Frost DK continues its upward march in dungeons as well as in raids, taking 4th from Elemental. Both DKs are on the rise, as Unholy also moves a spot up, taking advantage of Shadow's precipitous 5-spot fall to the bottom of the top 10. Arms remains stable as two Hunters burst in, Beast Mastery taking 8th and Marksmanship 9th, as Frost Mage disappears down towards the bottom. Speaking of the bottom, Devastation gets some new roommates there, as Outlaw and Destruction fall and give Enhancement and Feral a break.

      Mythic+ All Keys 95th Percentile Data by Warcraft Logs.
      All Percentiles
      As with the top percentiles, the top 3 remains solid, but 4th is immediately changed, thanks to Shadow's massive drop in performance this week. The Priest loses even more ground here, falling 9 spots into 13th, opening 4th up for Arms. Beast Mastery moves even higher here, grabbing 5th and moving in front of Elemental and Frost DK, as Marksmanship brings up the rear and completes the Hunter sandwich in 8th. Affliction breaks into the top 10, just ahead of Unholy which dropped to the final spot.

      Mythic+ All Keys All Percentile Data by Warcraft Logs.
      Raw DPS U.GG DPS Rankings
      U.gg's rankings are based on actual DPS taken from Warcraft Logs data, focusing on the top players and span the past two weeks.
      Frost DK finds itself on top in the raw DPS rankings, as Augmentation isn't calculated properly here. Fury and Arms grab the next two spots, moving ahead of Ret, and the Fyr'alath wins continue in 5th, where Unholy finished the legendary axe streak. Even Survival joins the Hunter good times in 8th, where all three specs gather, just ahead of Balance who closes out the top 10.
      Mythic+ All Keystone DPS rankings by u.gg.
       
       
      For even more in-depth data for each individual key head on over to Warcraft Logs. And if you're interested in more info on the specs themselves you can always check out our class guides (updated for the pre-patch), as well as our Mythic+ guides and Mythic+ tier list.
    • By Stan
      For the next two weeks, the Archaeology quest for Spirit of Eche'ro is available on live servers, so don't forget to get the rare mount before it's gone for 6 months!
      How to Get the Spirit of Eche'ro Mount
      1. Download MapCoords or some other add-os that displays coordinates in the game.
      2. Teleport to Azsuna from the Stormwind/Orgrimmar Portal Room or use your Dalaran Hearthstone to reach Dalaran (Legion) if you have one in your inventory.
      3. Seek out Archaeology Trainer Dariness the Learned in Dalaran at 41,26 and learn Archaeology if you already haven't.
      4. Accept The Right Path quest from the Archaeology Trainer and make your way to Thunder Totem in Highmountain.
      5. Talk to Lessah Moonwater to accept Laying to Rest. For the quest, you must collect 600 Bone Fragments of Eche'ro by rotating between four digsites in Highmountain. The exact locations with coords are outlined below.
      Digsite 1: Darkfeather Valley (50, 44) Digsite 2: Dragon's Falls (58, 72) Digsite 3: Path of Huin (44, 72) Digsite 4: Whitewater Wash (39, 65) it takes roughly around 2 hours to get the mount.
      Spirit of Eche'ro
      "The spirit of Huln Highmountain's pet moose."

      Hurry up! You only have until August 21, 2024, to get the mount!
    • By Stan
      MoP Remix characters that will transfer over to retail will receive a gear boost!
      With Patch 11.0.2 now live on Public Test Realms, you can copy over MoP Remix characters from retail! It appears all MoP Remix characters will receive a character boost so you can dive straight into action when the War Within expansion launches.

      We can't unfortunately log in to the game with the MoP Remix char on the PTR so we can't confirm the Item Level of gear for max level characters. However, keep in mind that the gear boost will scale with your level, so if you're below max cap, you will receive gear appropriate to your current level.
      When Can We Expect MoP Remix Characters to Transfer to Retail?
      MoP Remix ends on August 19, so we assume the characters will need to be transferred to retail by August 22 when Early Access begins.
×
×
  • Create New...