Testing in production

When running Sitecore on Azure PaaS, you are most probably familiar with deployments slot and swapping. By using a deployment slot, you can deploy you code to the slot, whilst your production environment keeps your website live. By hitting the “swap slots” button, the production and deployment slot are swapped, and the visitors arrive on your updated codebase.

In theory, this is a very smooth way of releasing. However, things can often get wrong. If you have appsettings or connectionstrings that are different on the 2 slots you want to swap, the slots are recycled during swapping. You can set application initialisation to avoid cold starts, but even then, we noticed that recycles sometimes take place when you don’t expect them. Even when your hotstart setup is done correctly, the site must startup in under 300 seconds, or the swap will fail.

This causes (small) downtimes and frustration.

Traffic Routing

Did you notice that in Azure Portal there is also a small box stating traffic% next to your slots?

This is the “testing in production” functionality in Azure.

Goal of this functionality is to prelease some new features in your code to a portion of your visitors. A kind of softrelease before opening the new features for all traffic.

The selection of visitors that are redirected to the beta-slot get a session cookie “x-ms-routing-name” with the slot name as value. Once you have this cookie, you will always be routed to the slot in your cookie. This is to make sure that you have a consistent session on the same codebase.

To return to the main slot, you can add querystring ?x-ms-routing-name=self to the url. This will update the cookie and send the requests of this visit back to the production slot.

Goodbye Swapping

We could also use this functionality to avoid swapping all together. By routing 100% of all traffic to a slot, this production slot is bypassed and you can redirect traffic without swapping.

Rerouting traffic with testing in production only works if the main production slot is started. So, when using this for deploys instead of swapping, it is prudent not to deploy code anymore to the main production slot. In this setup, the main production slot does not contain any code at all anymore, but you redirect traffic always to one of the slots.

So a release would go as follow:

  1. Deploy new code to staging slot.
  2. Direct all traffic to staging slot instead of live slot by changing the routing percentages on the main production slot. (New code is now live)
  3. Deploy new code to live slot.
  4. Direct all traffic to live slot instead of staging slot. We are now live and ready for a next release.

Force Traffic Redirection

There is only one catch in the above scenario. When users are routed to a slot, they get that slot-cookie. So, when you change the 100% traffic from one slot to another, there is still traffic on the slot with 0% because this only applies to new session.

Fortunately, this can be solved by using a bit of Azure magic. Let’s have a small look at this routing functionlity in Azure.

In Azure Resource Explorer (available via the webapp in Azure), navigate to <<your-webapp>> /config/web. Have a look at the the “RoutingRules” and “RampUpRules” section.

This is were the settings done in the UI get stored and this config is what azure uses to do the routing. When you set the values to zero in the UI, this gets stored in the config. Slots are only added here, the first time you route traffic to them. So you can have more slots then routing rules.

When a rule is present (even with routingPercentage = 0), Azure recognises this as a valid path, and routes the requests with a cookie corresponding to this route.

Now, the trick is to edit this config via the resource explorer and remove the routing rule with 0 percent. When a visitor with a cookie does a request, Azure won’t find that value in the routing rules and will re-evaluate the rules and overwrite the cookie.

Releasing a pro

This way of releasing is way more relaxed then by using swapping.

  • You have a lot more control over what is going on
  • You can gradually route traffic to the new code base to test
  • Reverting and routing traffic is much faster (matter of seconds)
  • No more cold starts or recycles
  • Always a nice warmed up website for your visitors

As a bonus, by not putting any code in the production slot, this one is a lot faster in recycling when this is required. Think of adding custom domain names on your app or changing SSL certificates. A recycle won’t hurt that much any more, it will be very short, and your app is not impacted will stay nicely warmed up.

7 thoughts on “Testing in production

  1. Yomna Fouad says:

    Thank you so much for this great post , it really help us as we was straggling with fact that swapping cause recycle even if we dont have any slot specific configuration.

    can you please share with us your finding if you had apply this to actual environment

  2. Jon says:

    I really wish it hadn’t taken me four months of searching to find this page. The whole azure swap thing is a total mess. Why they couldn’t just follow load balancing best practices is beyond me.

  3. Chandra says:

    I’m a bit curious to know on how this technique would play with custom DNS that are bound to the production slot. Any ideas?

    1. bart.verdonck says:

      The routing of testing in production is not done on IP level. So it should just work with your setup.

  4. Vineet says:

    Great Article, but doesn’t this approach require application deployment twice- one on staging and then on live This kind of defeats the purpose to test the same environment which is going live.
    I might be missing something here.

    1. bart.verdonck says:

      The application is not deployed on the main slot. But it is indeed deployed on the staging as well as on the live slot. However, if you choose to go with the “swapping” methodology, you also have a double deployment of your application: one in the “main” slot, and one in your “staging” slot.

  5. Jeroen Vantroyen says:

    As you suggested, I’m using a live and a staging slot, not using the production slot.

    I’ve found that, when you’re modifying the traffic % column to EMPTY instead of zero (the placeholder 0 will become visible, but slightly greyed out), there remains no trace of the routingRule with “0” as value. All traffic is than redirected to the new 100% slot, modifying their cookie. No need to use Resource Explorer anymore.


Leave a Reply to Jon Cancel reply

Your email address will not be published. Required fields are marked *