Wednesday, 24 February 2016

From Monolith to Microservices - Moving from 7 to 70 services with C# and automation, automation, automation

In this post I want to give a feel of our journey from monolith to microservices, to highlight the automation that is required to do this from a vary high level (you wont see any code samples in this post).

But first a little history

We are moving from a monolith towards a microservice EDA (Event Driven Architecture). Its been over 2 years in the making but we are getting there (although we still have 2 hefty monoliths to break down and a few databases as integration points to eliminate). We are a Microsoft shop using VS2015, C# 6, .net 4.6, NServiceBus and plenty of azure services. I'm sure you have read elsewhere that you don't move towards micro-services without adequate automation, well we have invested heavily in this in the last few years and its now paying off big time. It feels like that at the moment we have a new service or API deployed to prod every other week.

There are many benefits to a microservice style of architecture of which I'm not going to list them all here, this post is more about automation and tooling in the .Net landscape.

The tools are not the job

Sometimes you get complaints that you are spending too long on DevOps related activities. That 'the tools are not the job' implying that you should be writing code not tooling around the code!! That you should be delivering features not spending so much time getting things into production!! This might have been true in the good old days when you had one monolith, maybe two, with shared databases. It used to be easy to build it on your dev machine and FTP the build up to 'the' live server (better yet MSDeploy or publish from visual studio). Then just hack around with the config files, click click your done. In those days you could count on Moore's law to help your servers scale up as you increased load, you accepted the costs of bigger and bigger servers if you needed them.

But in this modern era scaling out is the more efficient, and cheaper solution especially with the cloud. Small services enable you to scale the parts of the system that need it rather than the whole system. But this architectural style comes at a cost, imagine hand deploying your service to each node running your software (there could be tens)!
We have currently over 70 core services/apis and over 50 shared libraries (via nuget). we run operations in 2 different regions each with a different configuration, different challenges and opportunities. Each region has a prod and preprod environment. We have a CI system and a test environment which is currently being rebuilt.
So 'the tools are not the job', maybe true, we do write a lot of C#, but without the tooling we would be spending most of our time, building things, deploying things and fixing the things we push to prod in a incorrect state.

Technology, Pipelines and Automation

We are a Microsoft shop dealing mainly with C#6, .net 4.6, ASP.MVC, WebApi, SQL Server, nservicebus, Azure, Rackspace and powershell, lots of powershell.

We use GoCd from thoughtworks as out CI/CD tool of choice, and consequently have invested heavily in this tool chain over the last 2 years. Below is a 10,000 foot view of the 400+ pipelines on our Go server. (rapidly moving towards 500)
https://go.....com:8154/go/pipelines
Below is a rough breakdown of the different pipelines and their purposes.

01 Nuget packages - 50+

The nuget packages mainly contain message definitions used for communications between services on the service bus (MSMQ and Azure Queues). Generally each service that publishes messages has a nuget package containing the messages so that we can easily communicate between services with a shared definition.
We also have a number of shared utility packages, these are also nuget packages.
These pipelines not only build and test the packages but also publish to our internal nuget feed on manual request. The whole process of getting a new package into our nuget package manager is automated, very quick, painless.

These pipelines have 3 stages:
1. Build - After every commit we build 2 artifacts, the nuget package file and a test dll. These 2 artifacts are stored in the artifact repository.
2. Test -  Upon a green build the tests run.
3. Publish - If the tests are green we have a push button publish/deploy to our internal nuget feed.

02 Automation packaging - 3

This contains powershell that is needed to perform builds and deploys. These packages are deployed to a common location by Go (section 05) on to all the servers that contain agents.

03 Databases - 14 

We actually have over 20 databases but using CI to control the deployment of these is only just gaining traction, we are currently building this side up and so not all DBs are in the CI pipeline yet. Most are still deployed by hand using Redgate tooling, this makes it bearable but we are slowly moving to an automated DB deploy pipeline too. 

04 Builds - 70+

Every service/api has a build pipeline that is kicked off by a checkin to either SVN or Git. They mainly consist of 2 stages, Build and Test.
Build not only compiles the code into dlls but also transforms the configuration files and creates build and test artifacts along with version info files to enable us to easily see the version of the build when it is deployed on a running server.
Test usually consists of two stages, unit and integration. Unit tests don't access any external resources, databases, file systems or APIs. Integration tests do, simples.
Only pipelines with a green build and test stage are allowed to be promoted to test or preprod environments.

05 Automation deployment - 10+

These pipelines deploy the automation as defined in section 02 to all the go agents. Region one has 3 servers in prod and preprod so 6 agents. Region two has 2 in prod and preprod and a further 2 that contain services that are deployed to by hand (legacy systems still catching up). Then we have 2 build servers and some test servers also. All these services have dependencies automatically pushed to them when the automation (which is mainly powershell) changes. Al these pipelines are push button rather than fully automatic, we want to keep full control of the state of the automation of all the servers.

06 Region one prod - 40+

07 Region one preprod - 40+

08 Region two prod - 50+

09 Region two preprod - 60+

Each Region has 2 identical environments, one for preparing and testing the new software (preprod) and one that is running the actual system (prod). They are generally the same but for a small number of services that cant be run on preprod environments (generally gateways to other external 3rd parties) But also services that are still being built and not yet on prod.
These pipelines generally take one of 2 forms. APIs and services.
The deployment of said APIs and services is documented here: http://foldingair.blogspot.co.uk/2014/10/adventures-in-continuous-delivery-our.html

Since that blog article was written we have started to move more and more to Azure PAAS, primarily app services and web jobs with a scattering of VMs for good measure, but in general the pattern of deployment stands.

10 Region two test system - 60+

11 Region two test databases - 6

We are currently in the process of building up a new test system (services and databases) which is why the stages are not showing color as yet. In the end we will be deploying every piece of software every night and running full regression tests against it in the early morning to ensure everything still integrates together as it did yesterday.

Totals

432 pipelines and 809 stages (This was 4 weeks ago when i started to draft out this post, i know, i know, anyway it is now 460+ pipelines and 870+ stages). I wonder if this is a lot? but if you truly are doing microservices its inevitable! isn't it?

The value stream map of a typical service  

Typically one service or API will be associated with 6 pipelines and potentially pull dependencies from others via nuget or other related pipelines. A typical value stream map is shown below:

Typical Value Stream Map

The circle on the left is source control detailing the check in that triggered the build (first box on the left). Then the 5 deployment pipelines run off this, test is shown at the bottom pulling directly from the build artifact. The 4 above are the 2 regions, prod on the right pulling from preprod in the middle, that pulls the artifact from the build.

This gives us great audit functionality as you can easily see which builds were deployed to prod and when it happened. Most importantly you cant even deploy to prod unless you have first gone through preprod, you cant go to preprod if any tests failed, and the tests will only run if the code actually built.

Config

Config is also dealt with as part of the build and deploy process. Config transforms are run at the build stage and then the deployment selects the config to deploy based on a convention based approach meaning all deployments are the same (templated) and config is easy to set up in the solution itself.
Whilst dealing with config in this manner its not the perfect solution it has served us really well so far, giving us confidence that all the configs of all services are in source control. We are really strict with ourselves on the servers also, we do not edit live configs on the servers, we change them in source control and redeploy. This does have its issues esp if you are only tweaking one value and there are other code changes now on trunk, but we manage. And are currently actively exploring other options that will allow us to separate the code from the config but keep the strict audit-ability that we currently have.

Monitoring and alerts

Monitoring a monolith is relatively straight forward, its either up and working or its not! With many small services there are more places for problems to arise, but on the flip side the system is more resilient as a problem wont cause the whole to come tumbling down. For this reason monitoring and alerts are vitally important.

Monitoring and alerts are handled by a combination of things. NServiceBus Service pulse, splunk, go pipelines (cradiator), monitor us, raygun, custom powershell scripts, SQL monitoring, azure monitoring, rackspace monitoring and other server monitoring agents. All these feed notifications and alerts into slack channels that developers monitor on the laptops at work and on our phones out of hours. We don't actually get too many notifications flowing through as we have quite a stable platform given the number of moving parts, which enables us to actually pay attention to fail and be proactive.

Summary

With each region we are planning to bring on board there are ~ 120-150 pipelines that need to be created, luckily the templating in GoCd is very good, and you can easily clone pipelines from other regions. But managing all this is getting a bit much.

I just cant imagine even attempting doing this without something as powerful as GoCd.

So the tools are not the job! maybe, but they enable 'the' job, without this automation we would be in the modern equivalent of dll hell all over again.

Feedback

What are your thoughts reader? Does this sound sensible? Is there another way to manage this number of microservices across this number of servers/regions? Please sound off in the comments below.

Thursday, 21 January 2016

To stand or not to stand?

Over my working life I've sat on many chairs. As a consultant you have to make do with what the client provides, but since I've become permie, I've been campaigning for a standing desk. Recently I got my wish, well, in part.

I don't get on with the chairs at work. They are not cheap ones by any means but by the end of the day I can really feel my neck and shoulders suffering. I'm not sure what it is with these ones in particular, I just don't get on with them. My cheaper chair at home is far better. I've actually been using a kneeling chair for the last year or so and that has been great, it really forces you to sit up straight. Its a little tiring at first but you get used to it, and I don't get the bad shoulders or neck that I used to get on the chair.

But like I said I really wanted to try a standing desk.

I managed to persuade the people that matter to give it a go and so between us we knocked up an IKEA hack standing desk made of coffee table and shelving unit (low budget experiment ~£30).

We had to make it low enough to allow the shortest member of the team to use it but we can 'jack up' the shelf using books and boxes to get it higher for other team members.

I can use it all day but I find my feet and legs get sore mid afternoon and I really need a sit down. The desk is a long way from hydraulic and its a very big faff to put all the stuff down on the desk so I've sacrificed my second screen in a mirrored set up so i can stand and sit as i please. I tend to do the morning stood up then split the afternoon alternating sitting and standing as I feel the need. It's a compromise and since the screens are big I can work split screen rather than multi monitor without losing too much.



Next stop walking desk :-) I really want to do this to alleviate the stress on the knees and feet. I've never tried it but from what I hear from people it's good. I can't see work going for that, maybe a wobble board though!

Monday, 30 November 2015

Do we need to deploy clean up work if there is no additional functionality? i.e no (perceived) business benifit.

Background

We recently did some work to a number of services where all we were doing was removing some old functionality that is no longer used (obsolete code). Removing messages, handlers, classes, tests. I like deleting code, it makes things simpler, less logic to break, less places for bugs to hide. Once we finished, we wanted to get all of this released to prod ASAP. We tested all of the areas affected in a large system level test in the preprod environment, all of the services (and others that had not changed) working together.

But we had push back. 

Push back

The question was: Why do you need to release this clean-up in advance of any further work?
By doing so you are making this an active rather than passive deployment with associated extra risk and double the cost.
If you are removing unused code you can just deploy with the next addition/change to code because by testing that you are implicitly testing that absence of code. Even if the new code isn’t affected, then our deployment checklists cover that situation too – we have already double checked this removal/clean-up of code won’t have an impact on production

Rebuttal

I broke this down into a number of sections. A number of questions that I thought were being asked in the statement.

Question: Why do many releases instead of one, isn’t that more risky?
This is a question from the old skool of thought, releases are big bad things we need to do as few as possible.
Answer: I would say many small releases are inherently less risky.
If (very unlikely but) if something goes wrong it will be less clear what caused the issue. Was it new functionality, or the clean-up work that is the culprit?
If we release now, we know what to monitor over the coming days, and have to monitor less
If we don’t deploy all 8 things, someone else will (at some point and in some cases many months in the future). This poor soul will need to decide if the changes we did need testing, what the consequences of the changes are, and worry about if there are other dependent services to deploy.
Each service is push button, so deploy time is small.
Rollback is not hard if we need it with no business consequences at the moment. In future if we release with other functionality and the changes we have made break something we need to rollback new functionality too.

Question: If we are removing code why the need to release anything? There is no new functionality to release.
I guess this is a question about business value, no new business value no need to release.
Answer: That is true, its mostly removing old code and cleaning up. But its just as critical, almost more so to get this out in a small release sooner, as we may (again unlikely but possible, we are human) have removed something we should not have.

Question: Won't it be more to test doing it twice?
This assumes that we manually test everything on every release. where actually we only test what has changed manually in conjunction with automated testing for the rest.
Answer: We have already done good full end to end testing last week of the 8 things affected. If we wait until next week or the week after we will have to do the tests again as the versions of things to be released will all be different by then, so will need to test the 8 deployable things again full stack = extra 1 day

Other reasons for deployment.
The changes, what we did and why we did it, are still fresh in our minds. The longer we leave it the less sure we are that we will be doing the right things.
Its not critical, but I'd prefer not to do a partial deployment (service-X is going to get released soon) I'd like the rest of the clean-up to be deployed too.
Ideally prod and preprod are as same as possible (for environmental consistency and testing reasons). Any differences between the test system of preprod and prod invalidates other testing efforts. Because in prod services will be integrating with a different version of other services than that are in preprod making like-for-like testing impossible.

Conclusion

I maintain there is business benefit in doing the deployment now, and deploying all 8 services at that. To be fair businesses, managers, stakeholders, even developers (especially senior ones) have all seen their fair share of long deployments, failure and difficult rollbacks. Leading, ultimately to a fear of deployment. So it's natural to want to avoid the perceived risks. But perversely by restricting the number of deployments you are actually increasing the likelihood of future fail.
A core philosophy of the devops culture is to release early and often (continuous delivery). By doing the things you find painful more often you master them and make them trivial, there by improving your mean time to recovery.

The business benefit is ultimately one of developer productivity, testability and system up-time

Tuesday, 17 November 2015

Splunk alerts to slack using powershell on windows

We use Splunk to aggregate all the logs across all our services and APIs on many different machines. It gives us an invaluable way to report on the interactions of our customers through new business creation on the back-end servers running on NServiceBus, to the day to day client interactions on the websites and mobile apps.

We have been investing in more monitoring recently as the number of services (I hesitate to use the buzzword micro, but yes they are small) is increasing. At present pace I'd say there is a new service or API created almost each week. Keeping on top of all these services and ensuring smooth running is turning into a challenge, which splunk is helping us to meet. When you add service control, pulse and insight from particular (makers of NServiceBus) we have all bases covered.

We have recently added alerts to splunk to give us notifications in slack when we get errors.

The Setup

We are sending alerts from splunk to slack using batch scripts and powershell.

Splunk Alerts

First set up an alert in splunk, This splunk video tells you how to create an alert from a search results. We are using a custom script which uses arguments as documented here. Our script consists of 2 steps a bat file and a powershell file. The batch file calls the powershell passing on the arguments.

SplunkSlackAlert.bat script in C:\Program Files\Splunk\bin\scripts
@echo off
powershell "C:\Program` Files\Splunk\bin\scripts\SplunkSlackAlert.ps1 -ScriptName '%SPLUNK_ARG_0%' -NEvents '%SPLUNK_ARG_1%' -TriggerReason '%SPLUNK_ARG_5%' -BrowserUrl '%SPLUNK_ARG_6%' -ReportName '%SPLUNK_ARG_4%'"

SplunkSlackAlert.ps1 lives alongside
param (
   [string]$ScriptName = "No script specified",
   [string]$NEvents = 0,
   [string]$TriggerReason = "No reason specified",
   [string]$BrowserUrl = "https://localhost:8000/",
   [string]$ReportName = "No name of report specified"
)

$body = @{
   text = "Test for a parameterized script `"$ScriptName`" `r`n This script retuned $NEvents and was triggered because $TriggerReason `r`n The Url to Splunk is $BrowserUrl `r`n The Report Name is $ReportName"
}

#Invoke-RestMethod -Uri https://hooks.slack.com/services/AAAAAAAAA/BBBBBBBBB/CCCCCCCCC -Method Post -Body (ConvertTo-Json $body)

Slack Integration

You can see the call to the slack API in the invoke-restmethod, the slack documentation for using the incoming web hook is here. there is quite a rich amount of customization that can be performed in the json payload, have a play.

Before you can actually use this you must first setup slack integration as documented here which requires you to have a slack account.

The fruits of our labor:


All the script code is given in my gist here.

Credits:

Thanks to my pair Ruben for helping on this, good work.

Tuesday, 3 November 2015

Developer podcasts v2

A couple of years ago i wrote a blog post about podcasts for developers, this is a follow up as I've now got substantially more. That and a couple of my colleges have asked for my list recently.

Programming


.NET Rocks! : Feed Url http://www.pwop.com/feed.aspx?show=dotnetrocks&filetype=master
Adventures in Angular : Feed Url http://feeds.feedwrench.com/AdventuresInAngular.rss
All Chariot Podcasts : Feed Url http://chariotsolutions.com/podcasts/show/all-shows/feed/
All Things Pivotal : Feed Url http://pivotalsoftwarepodcast.libsyn.com/rss
Azure Friday - Channel 9 : Feed Url https://channel9.msdn.com/Shows/Azure-Friday/RSS
CodeChat (Audio) - Channel 9 : Feed Url https://channel9.msdn.com/Shows/codechat/feed/mp3
Coding Blocks | Software and Web Programming / Security / Best Practices / Microsoft .NET : Feed Url http://www.codingblocks.net/podcast-feed.xml
Debug : Feed Url http://feeds.feedburner.com/debugshow
Devnology Podcast : Feed Url http://feeds.devnology.nl/DevnologyPodcast
DevRadio - Channel 9 : Feed Url https://channel9.msdn.com/Blogs/DevRadio/RSS
Full Stack Radio : Feed Url https://simplecast.fm/podcasts/279/rss
Functional Geekery : Feed Url http://www.functionalgeekery.com/feed/mp3/
Hack && Heckle : Feed Url http://feeds.feedburner.com/HackAndHeckle
Hanselminutes : Feed Url http://feeds.feedburner.com/Hanselminutes
Herding Code : Feed Url http://herdingcode.com/feed
Javascript Jabber : Feed Url http://feeds.feedwrench.com/JavaScriptJabber.rss
Jesse Liberty - Silverlight Geek : Feed Url http://feeds.feedburner.com/JesseLiberty-SilverlightGeek
MS Dev Show : Feed Url http://msdevshow.libsyn.com/rss
NodeUp : Feed Url http://feeds.feedburner.com/NodeUp
PowerScripting Podcast : Feed Url http://feeds.feedburner.com/PowerScripting
Radio TFS : Feed Url http://feeds.feedburner.com/radiotfs
Ruby Rogues : Feed Url http://rubyrogues.com/podcast.rss
RunAs Radio : Feed Url http://www.pwop.com/feed.aspx?show=runasradio&filetype=master
Simple Programmer Podcast : Feed Url http://simpleprogrammer.libsyn.com/rss
Software Engineering Radio - the podcast for professional software developers : Feed Url http://www.se-radio.net/rss
STLTechTalk Podcast : Feed Url http://stltechtalk.libsyn.com/rss
The Azure Podcast : Feed Url http://feeds.feedburner.com/TheAzurePodcast
The Cognicast - Cognitect Blog : Feed Url http://feeds.feedburner.com/cognicast
The Java Posse : Feed Url http://feeds.feedburner.com/javaposse
The Static Void Podcast : Feed Url http://www.staticvoidpodcast.com/feed/podcast/
This Week On Channel 9 (MP4) - Channel 9 : Feed Url http://s.ch9.ms/shows/This+Week+On+Channel+9/feed/ipod
ThoughtWorks : Feed Url http://feeds.soundcloud.com/users/soundcloud:users:94605026/sounds.rss
WebDevRadio : Feed Url http://webdevradio.com/feed/
Windows Weekly (MP3) : Feed Url http://feeds.twit.tv/ww.xml
YAPP: Yet Another Programming Podcast : Feed Url http://yapp.audio/feed/yapp

Dev-ops


Arrested DevOps : Feed Url http://feeds.podtrac.com/VGAulpN7MY1U
DevOps Cafe Podcast : Feed Url http://devopscafe.libsyn.com/rss
Devops Mastery : Feed Url http://feeds.soundcloud.com/users/soundcloud:users:79143337/sounds.rss
DevOps.com : Feed Url http://feeds.soundcloud.com/users/soundcloud:users:87329161/sounds.rss
Ops All The Things! : Feed Url http://opsallthethings.s3-website-us-east-1.amazonaws.com/podcast.xml
The Food Fight Show : Feed Url http://feeds.feedburner.com/TheFoodFightShow
The Ship Show : Feed Url http://theshipshow.com/podcast.xml

Developer related


Developer On Fire : Feed Url http://feeds.feedburner.com/developeronfire
Get up and CODE! : Feed Url http://feeds.feedblitz.com/getupandcode
Mastering Business Analysis : Feed Url http://masteringbusinessanalysis.com/feed/podcast
Programmer Vs World : Feed Url https://programmervsworld.wordpress.com/category/the-podcast/feed/
Startups For the Rest of Us » Episodes : Feed Url http://www.startupsfortherestofus.com/category/episodes/feed
The Security Influencer's Channel : Feed Url http://contrastsecurity.libsyn.com//rss

Agile


Agile Instructor - Coaching for Agile Methodologies such as Scrum and Kanban : Feed Url http://feeds.feedburner.com/AllThingsAgile
Agile NYC : Feed Url http://feeds.feedburner.com/AgileFM
Agile Weekly Podcast : Feed Url http://integrumtech.com/feed/podcast/
The Agile Coffee Podcast : Feed Url http://agilecoffee.com/feed/podcast/
This Agile Life : Feed Url http://feeds.feedburner.com/thisagilelife/podcast

Non tech


60-Second Mind : Feed Url http://www.scientificamerican.com/podcast/sciam_podcast_i_psych.xml
99% Invisible : Feed Url http://feeds.99percentinvisible.org/99percentinvisible
All items | LSE Public lectures and events | Audio : Feed Url http://www.lse.ac.uk/assets/richmedia/webFeeds/publicLecturesAndEvents_iTunesRssAudioOnlyAllitems.xml
Freakonomics Radio : Feed Url http://feeds.feedburner.com/freakonomicsradio
Friday Night Comedy from BBC Radio 4 : Feed Url http://www.bbc.co.uk/programmes/p02pc9pj/episodes/downloads.rss
Haute Couture Podcast - Claudia Cazacu : Feed Url http://claudiacazacu.podomatic.com/rss2.xml
Monstercat Podcast : Feed Url https://www.monstercat.com/podcast/feed.xml
NPR: Invisibilia Podcast : Feed Url http://www.npr.org/rss/podcast.php?id=510307
NPR: TED Radio Hour Podcast : Feed Url http://www.npr.org/rss/podcast.php?id=510298
Planet Money : NPR : Feed Url http://www.npr.org/templates/rss/podlayer.php?id=93559255
Radiolab from WNYC : Feed Url http://feeds.wnyc.org/radiolab
RI Blog : Feed Url http://www.rigb.org/blog.ajax
TEDTalks (audio) : Feed Url http://feeds2.feedburner.com/tedtalks_audio
TEDTalks (video) : Feed Url http://feeds.feedburner.com/tedtalks_videos

Notes


Bear in mind that some of these podcasts are no longer active. I've kept them in my list because i find past episodes very relevant to the here and now. you can search back to find relevent episodes on anything you care to think of, super useful.

Incidentally my current podcast player of choice is podcastaddict, variable speed playback and great control of how to download files, brilliant search functionality for local podcasts and its very easy to search for and add new podcasts. use it :-)

My opml file extracted from podcast addict is located here: https://gist.github.com/DamianStanger/9606fcf3fb09cc0bd87f

Tuesday, 16 June 2015

Advanced filtering and navigation on Thoughtworks Go Cd with tamper monkey

Our problem

We use Go from Thoughtworks to manage the build and deployment of all our software. Attached below is a screen shot of the current pipelines.

As you can see we have quite a lot going on. In fact we have 390 pipelines (55 services and api builds, 40 nuget package build/publish, and the rest are deployments to our 5 environments from test through preprod and live). There are 730 stages in total (Build, test, deploy, ect). And we have 20 go agents running on different servers.

So you can imagine finding the pipeline you are looking for is tough. We have adopted naming conventions which help but its really quite difficult to find what you are after with the supplied search capabilities.

Also its very tricky to see if anything is broken (red), there is just no way you will notice it in all the pipelines.

As an aside we are using cradiator to give us an information radiator of all our build and deployment pipelines (I blogged about this a year ago) but with over 700 stages its really in need of an overhaul but that's for another blog post.


The solution

We have created a tamper monkey script (found here) that enhances the functionality of the Go pipelines view, allowing you to filter the visible pipelines by status or by keyword.
 

There is also the issue of navigating between the settings and the pipeline history, and visa versa. Often you are investigating a failure, you find the problem in the history and then want to change the settings. Well there is no link, so we created one for you to easily navigate between the 2 parts of the UI easily.

Notice the "Settings" link above and the "History" link below both in the header

The script can be found here: https://gist.github.com/DamianStanger/d1db873e9297b34160ce#file-go-cd-tampermoneyscript-js

Install

Firstly you need to get tamper monkey installed in your browser, you can get it from the chrome web store or http://tampermonkey.net/


Then from the tamper monkey dashboard add the script and tell it what urls to attach itself to. The update url needs to be your instance of Go (our update URL is https://goserver:8154/go/* ). On the settings tab I also add in a couple of user includes to https://goserver:8154/go/home and https://goserver:8154/go/pipelines.

That's it, instantly enjoy better productivity. I imagine this is a good enhancement even if you only have a tenth of the pipelines we have.



Wednesday, 15 October 2014

Blue green web deployment with powershell and IIS

I wanted to follow up my earlier post (about our current CD process) with a more technically focussed one, one that can describe the nuts and bolts of the actual BlueGreenDeployment.

Technology 

Powershell, powershell and powershell, oh and windows, IIS and Go (build server)

Process

As I described in my earlier post the blue green web deploy consists of these steps:


1. Deploy
1.1 Fetch artifact
1.2 Select the config for this deployment
1.3 Delete the other configs
1.4 Deploy to staging (delete then copy)
1.5 Backup live

2. Switch blue green
2.2 Point live to new code
2.3 Point staging to old code

Blue Green Deployment

Before diving in to the details I should firstly convey what blue green deployments are, and what they are not.
There are a few different ways to implement blue green deployments but they all have the same goals:
1. Allow testing on live without actually being live.
2. Enable deployments to have the smallest possible impact on the live service as possible.
3. Give you an easy roll-back path.

This can be accomplished in many ways. Techniques include DNS switching, directory moving, or virtual path redirecting. 
We have chosen to do IIS physical path redirecting. This allows us to do the same technique on all our environments from test to live, same scripts, same code, and doesn't cost as much as requiring multiple servers which DNS switching would require.

Commands used for this demo are

PS> .\Create-Websites.ps1 -topLevelDomain co.uk
PS> Deploy-Staging -source c:\tmp -websiteName foobarapi -domainName foobar.co.uk
PS> Backup-Live -WebsiteName foobarapi -DomainName foobar.co.uk
PS> Switch-BlueGreen -WebsiteName foobarapi -DomainName foobar.co.uk

The code I'm going to talk through is all located here: https://github.com/DamianStanger/Powershell

Conventions used:

All websites are named name.domain and name-staging.domain
All backing folders are in c:\virtual and are named name.domain.green and name.domain.blue
You don't know if blue or green is currently serving live traffic.
Backups are taken to c:\virtual-backups\name.domain
Log files always live in c:\logs\name.domain
There is always a version.txt and bluegreen.txt in the root of every website/api

In this example I'm using name=foobarapi and domain=foobar.co.uk

The technical detail

This is the meaty stuff, it consists mainly of powershell, and should work no matter what CI software you are using. I can heartily recommend Go by Thoughtworks. It has a built in artifact repository and brilliant dependency tracking through its value stream map functionality.

Setup IIS and backing folders

To test my deployment scripts you will firstly need to set up the dummy/test folders and IIS websites. For this you can use this script: Create-Websites.ps1. I'm not going to go into detail of the script as its not the focus of this post but it creates your app pool and website.

The code is exercised with the following:
setupWebsite "foobarui" "foobarui-test" $true "green"
applyCert("*.foobar.*") <<optional if you want the sites to have an ssl cert applying>>

This will create 2 websites on IIS pointing to the green and blue folders as per the conventions outlined further above. Finally apply an SSL certification using powershell, this command will apply the SSL cert to all the websites in this instance of IIS.
To remove the created items from IIS issue commands similar to this:
PS> dir IIS:\AppPools | where-object{$_.Name -like "*.foobar.co*"} | Remove-Item
PS> dir IIS:\Sites | where-object{$_.Name -like "*.foobar.co*"} | remove-item
PS> dir IIS:\SslBindings | remove-item

Once you have the websites correctly set up you can then utilise the deploy blue green scripts :-)

Deployment

The Blue Green deployment module is located here: BlueGreenDeployment.psm1 and will need importing into your powershell session with the following command:
PS> Import-module BlueGreenDeployment.psm1
Once you have the module imported you can issue the following commands:
PS> Deploy-Staging -source c:\tmp -websiteName foobarapi -domainName foobar.co.uk
PS> Backup-Live -WebsiteName foobarapi -DomainName foobar.co.uk
PS> Switch-BlueGreen -WebsiteName foobarapi -DomainName foobar.co.uk

Lets dig into these one by one.

1. Deploy-Staging
This is quite straight forward. Find the folder that is currently serving staging and copy the new version there. The interesting bit of code is the method of determining which folder to replace with the new version. IsLiveOnBlue and GetPhysicalPath work together to determine the folder in use on staging. Notice the retries inside GetPhysicalPath I found that sometimes IIS just doesn't want to play, but if you ask it a second time it will?? Don't ask..
The code that actually determines the physical path is:
$website = "IIS:\Sites\$WebsiteName.$domainName"
...
$websiteProperties = Get-ItemProperty $website
$physicalPath = $websiteProperties.PhysicalPath

The rest of the powershell is relatively straight forward

2. Backup-Live
Backing up live is again pretty standard powershell. Again determine the folder that is serving live then do a copy. Done.

3. Switch-BlueGreen
Performing the switch is actually really easy when it comes to it. Firstly determine which folder (blue or green) is serving live (same code as the deploy step) and then switch it with the staging website.
Set-ItemProperty $liveSite -Name physicalPath -Value $greenWebsitePath -ErrorAction Stop
The only added complication is the rewriting of the log file location in the web.config. Log4net only really works well if one process (web site) uses one log file. Again you can look this up yourselves as this is an aside to the main purpose of this post.

Conclusion

The interwebs in general are full of articles/opinions/tales of how bad windows is to automate, it actually winds me up. Maybe it used to be true but I've been finding that with powershell and Go I've been able to automate anything I need. It's so powerful. Don't let the microsoft haters stop you from doing what needs to be done.

The blue green deployment technique outlined here is working really well for us at the moment and has helped us to take our projects live sooner/quicker and with more confidence.

Automation for the win.