Wednesday, 4 May 2016

Deploying Azure websites and webjobs using your CI system with msbuild and azure powershell cmdlets

There are lots of tutorials detailing how to send web apps and web jobs up to azure as part of your CI pipeline but I've found them all lacking, mainly in that they require you to upload to Azure in the same command as performing your build.
Often this command looks like the following:
msbuild.exe foobar.csproj /p:DeployOnBuild=true /p:PublishProfile="azurePubProfile" /p:Configuration=Release
This simultaneously builds, configures and publishes the project to azure.
 
This is unacceptable to me as I want to promote the same built artifact through many environments ci-test-preprod-prod where the only thing to change is the config getting deployed to each. I don't want to keep building the code over and over when I deploy which has many issues ranging from changes in the code between deployments to changes in the environment of the build machine. I want to know that what is getting deployed to prod is exactly the same binaries as was deployed and tested in all previous environments. To achieve this we need to separate the building of the artifact from the deployment.

What follows is the method we are now using to build and deploy our services to Azure. It breaks down into 4 simple steps
  1. Build your solution, firstly restoring any nuget packages if needed
  2. Configure your service for the target environment and zip up the artifact
  3. Configure the powershell session for azure
  4. Publish the zipped artifact to azure

App service (websites)

Build

Firstly build your solution, if using any nuget packages restore these first
nuget restore
Perform the build on the solution
msbuild foobar.sln /t:build /p:configuration=release /p:OutDir=BUILD_OUTPUT_PATH
Grab the published website from ./foobar/BUILD_OUTPUT_PATH/_PublishedWebsites/foobar and safe it to your artifact store of choice.

Deploy

Firstly apply the correct config for your target environment for this deployment to the artifact saved from the build. there are many ways to do this so i wont go in to it here.

Next zip up the deployable artifact
Add-Type -Assembly System.IO.Compression.FileSystem
$compressionLevel = [System.IO.Compression.CompressionLevel]::Optimal
[System.IO.Compression.ZipFile]::CreateFromDirectory($sourcedir, $zipfilename, $compressionLevel, $false)

Prepare the powershell session for the upload to azure
Load in your azure cert
Import-AzurePublishSettingsFile azure-credentials.publishsettings
Select the desired subscription
Select-AzureSubscription 00000000-0000-0000-0000-000000000001
Publish the website using the azure cmdlet
Publish-AzureWebsiteProject -Package buildoutput.zip -Name foobar -Slot 'staging'

Webjobs

As above except the location of the built artifact foobarjob/BUILD_OUTPUT_PATH
And the cmdlet for creating the webjob
New-AzureWebsiteJob -Name foobar -JobName foobarjob -JobType Continuous -JobFile .\buildoutput.zip

Conclusion

It turns out its actually really easy to achieve the result we desired, mainly the repeatable deployment of a known, tested build artifact to any environment we like.

We use GoCd from thoughtworks to orchestrate the above process through the pipelines feature. When all this is combined with staging slots, blue green deployments and the like you get a really robust process, excellent auditability, a solid rollback strategy and not to mention peace of mind when pushing to prod.

Id love to hear how you achieve the same results, whether its through version control strategies, azure deployment slot strategies or something else.
Let me know.

Azure app service error 'An attempt was made to load a program with an incorrect format'

I was deploying a new website today and came across an issue running the site on azure. The site ran fine on my dev machine both running under vs2015 and when setup as a website under my local IIS.

But when running on a freshly created Azure app service i got the error:

Could not load file or assembly 'System.Web' or one of its dependencies. An attempt was made to load a program with an incorrect format.

Exception Details: System.BadImageFormatException: Could not load file or assembly 'System.Web' or one of its dependencies. An attempt was made to load a program with an incorrect format.


[BadImageFormatException: Could not load file or assembly 'System.Web' or one of its dependencies. An attempt was made to load a program with an incorrect format.] System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly ....


After digging around on the interwebs for a while it became apparent it was an issue with either the target architecture or the targetting of the dlls. My solution was built to target 'any cpu' but somehow the version of System.Web that was brought down by nuget was the 64bit version.

So i figured that changing the target architecture would have the desired effect and changed the azure platform from 32 bit (the default for new app services) to 64 bit in the application settings of the azure portal (portal.azure.com).


Problem solved. I hope this will point someone else to a quick resolution in the future.

Friday, 29 April 2016

Output nunit2 compatable xml from the nunit3 console runner in your CI pipeline.

Upgrades

We have recently upgraded many of our .net projects to use C#6 and .net 4.6.1. At the same time we upgraded nunit to nunit3 on all our build servers. Problem was that GoCd (our CI server) doesn't recognize the new nunit3 result format, only the older more ubiquitous junit/nunit2 xml format. whilst failing tests would cause the build to break you had to dig into the console output to see the failures rather than looking in the tests tab.

Investigation

The solution to the problem was to get the nunit3 console runner to output results xml in the old nunit2 style, easier said than done. The documentation for the runner is actually quite bad in this regard. All it says is: "The --result option may use any of the following formats. nunit2" but doesn't actually show how. I ended up in the source code for the nunit3-console.tests and saw the answer burred within.

Solution

The final command i ended up with is as follows:
nunit3-console --result:"Result.xml;format=nunit2" --where "cat!=Integration" .\tests.dll
This output was then able to be imported into Go and we again get visibility of the numbers of tests run, passing and failing on the go server.

Thanks to all who responded on the google group discussion pointing me in the right direction.

Thursday, 7 April 2016

Applying Azure resource locks to all the databases and storage accounts in a given resource group with powershell

If you have followed any of my previous blogs you will know we have tens of microservices (over 50) in our current architecture. With these microservices goes data (lots of data, valuable data). Each service has storage accounts and/or databases (which we dont really want to loose). We have been going through the process of automating the creation of these resources and in the process need to ensure they are not accidentally deleted (as we have tear down scripts, dangerous in the wrong hands).

Powershell

What follows are some powershell commands that can add resource locks to all your databases and storage accounts, they took a while to build, but are very effective, enjoy.
Write-Host -ForegroundColor Cyan "Adding a CanNotDelete lock to all databases"
Get-AzureRmResource `
 | Where-Object {$_.ResourceGroupName -eq myresourcegroupname -and `
                 $_.ResourceType -eq "Microsoft.Sql/servers/databases"} `
 | Select-Object `
     ResourceName,ResourceType, `
     @{name="name"; `
       Expression={$_.name.replace("myazuresqlservername/","")}}, `
     @{name="lockname"; `
       Expression={"lock-databases-"+$_.name.replace("myazuresqlservername/","")}} `
 | %{New-AzureRmResourceLock -ResourceGroupName myresourcegroupname`
                             -LockLevel CanNotDelete `
                             -LockNotes "Prevent accidental deletion" `
                             -LockName $_.lockname `
                             -ResourceName $_.ResourceName `
                             -ResourceType $_.ResourceType `
                             -Verbose -Force -ErrorAction Stop}

Write-Host -ForegroundColor Cyan "Adding a CanNotDelete lock to all storage accounts"
Get-AzureRmResource `
 | Where-Object {$_.ResourceGroupName -eq myresourcegroupname -and `
                 $_.ResourceType -eq "Microsoft.Storage/storageAccounts"} `
 | Select-Object ResourceName,ResourceType,Name, `
                 @{name="lockname"; `
                   Expression={"lock-storageAccounts-"+$_.name}} `
 | %{New-AzureRmResourceLock -ResourceGroupName myresourcegroupname`
                             -LockLevel CanNotDelete `
                             -LockNotes "Prevent accidental deletion" `
                             -LockName $_.lockname `
                             -ResourceName $_.ResourceName `
                             -ResourceType $_.ResourceType `
                             -Verbose -Force -ErrorAction Stop}

You can customise a bit further and replace the strings "myazuresqlservername" and "myresourcegroupname" with powershell variables and stick this straight in a powershell console or in a script.

Lock removal

As an aside, if you do subsequently want to delete the DB or storage account you first need to remove the lock like this:
Remove-AzureRmResourceLock -ResourceId /subscriptions/00000000-1111-2222-3333-444444444444/resourceGroups/myresourcegroupname/providers/Microsoft.Sql/servers/myazuresqlservername/databases/mydatabasename -LockName lock-databases-mydatabasename

Feedback

Please if you found this useful or you know a better way let me know in the comments below. cheers.

Tuesday, 29 March 2016

The fallout of going the microservice route. Build and deploy headaches

Build/Deploy issues with GoCd

As you may have seen in a previous blog post we have a lot of pipelines, over 500 now and its set to grow by another 200 in the next month as we bring another 3 environments up. 700 pipelines is too much for a vanilla go instance to handle, the server can sometimes wait 3-4 minutes before detecting a check-in or to respond to a manual pipeline click.
I'm not having a go at go (no pun intended) i love it i think its very good at its job, but when you get to this scale apparently the embedded H2 DB starts to be a bottleneck and so you need to upgrade to the postgreSQL plugin which adds more power on the back end which would solve some issues I've seen. But there is also an issue with the main pipelines UI which can take 10+ seconds to render with 500 pipelines on the page at once, I put this down to browser performance as you can see the call to the server coming back quite quick with the markup, its just a very big page (15000+ divs, 1600+ forms, 4300+ input fields all with associated styling and javascript). So with all this in mind we decided to split up our go server into six smaller instances.

Go server/agent setup

We have over 70 micro-services mainly written in .net but with a scattering of node and spa apps (detailed here), each is independently deployable and so we have decided to split them based on service boundary which we have roughly 6. So we have decided to create 6 go servers and split the workload across all six so that each server will only be handling 1-2 hundred pipelines each.

There are three consequences to splitting up the servers like this: One is that any templates are going to get duplicated, and you will have to keep track of failing builds on six servers instead of just one which means you will need to log into six servers and remember which services are on which go server instance. But by splitting the work by service boundary we will keep the value stream map intact where by we would loose it if we had split the servers by deployment region of which we now have three (each with a test, preprod and prod environment).

Installing multiple go agents on one VM, pointing at multiple different go servers

So we have six VMs each with a go server installed, and now by the use of powershell six agents on each VM. Each VM has an agent for each of the servers so that the workload can be spread evenly. The problem with this is that whilst a given VM can host several agents (read this) all the agents will point back to the same go server because the install uses an environment variable. The solution to this is to hack the config\wrapper-agent.conf file, change all the instances of '%GO_SERVER%' replacing it with 'go-serviceboundary1.mydomain.com' which is the DNS entry for the go server you are working for.

Given i was creating six agents on six machines i didn't want to do this by hand so i created a script to do it for me (found here) the more interesting bits are summarised below:

Download the latest go agent:
$goSetupExe = "go-agent-16.2.1-3027-setup.exe"
$client = new-object System.Net.WebClient
$client.DownloadFile("https://download.go.cd/binaries/16.2.1-3027/win/$goSetupExe", "C:\go\$goSetupExe")

Command line install the first agent
.\go-agent-setup.exe /S /SERVERIP=go-serviceboundary1.mydomain.com /D=C:\go\agent1
Copy the install to a new instance
new-item "C:\go\agent$agentNumber" -ItemType Directory
Copy-Item "C:\go\agent1\*" -Destination "C:\go\agent$agentNumber" -Recurse
Remove-Item "C:\go\agent$agentNumber\config\guid.txt"
Remove-Item "C:\go\agent$agentNumber\*.log"

Rewrite the config
(Get-Content "C:\go\agent$agentNumber\config\wrapper-agent.conf").replace('go-serviceboundary1.mydomain.com', "go-$boundedContext.mydomain.com") | Set-Content "C:\go\agent$agentNumber\config\wrapper-agent.conf"

(Get-Content "C:\go\agent$agentNumber\config\wrapper-agent.conf").replace('c:\go\agent1', "c:\go\agent$agentNumber") | Set-Content "C:\go\agent$agentNumber\config\wrapper-agent.conf"

Create and start the second service instance
New-Service -Name "Go Agent$agentNumber" -DisplayName "Go Agent$agentNumber (go-$boundedContext.mydomain.com)" -Description "Go Agent$agentNumber (go-$boundedContext.mydomain.com)" -StartupType Automatic -BinaryPathName "`"C:\go\agent$agentNumber\cruisewrapper.exe`" -s `"c:\go\agent$agentNumber\config\wrapper-agent.conf`""

Start-Service "Go Agent2"


It works really well for getting a new setup running quickly, and if i ever need to add a new go server and new agents this will be easy to extend to quickly get up and running.

Feedback

It would be great if i could get some feedback as to whether this is a good idea or not, or else how best to split things up, but i feel this is a nice pragmatic solution with minimal downsides.

Thursday, 17 March 2016

Pattern matching in sublime text with regex

An exercise in Yak shaving

Recently I had a very large xml file where i needed to do some string manipulation and replacements, sublime text is always my go to editor for this type of thing.
Here is an example snippet.
<data name="NumberToWord_1" xml:space="preserve">
  <value>first</value>
</data>
<data name="NumberToWord_2" xml:space="preserve">
  <value>second-value</value>
</data>
<data name="NumberToWord_3" xml:space="preserve">
  <value>3rd text value</value>
</data>


I needed to select the names and then paste them into the values.

Firstly we need to select the name with one of the following regex:
(?<=<data name=").+?(?=")
(?<=<data name=")[^"]+
<data name="\K[^"]+
^.*?"\K\w+

The first uses look behind, look around and a non greedy selector, maybe not the easiest to understand. I evolved this into the second regex by doing away with the lazy selector and the look around at the end. the third is basically the second rewritten with the meta-character \K to reset the start point of the regex (keep). Finally i trimmed away the data by looking for the first double quote on the line, resetting the keep and after that only selecting word characters.

once i had all the name strings highlighted i can use sublime texts multiple copy feature to put the 100+ words into the clipboard.

Next to paste the values back. So we need to select the value
(?<=<value>).+(?=<value>)
(?<=<value>)[^<]+
<value>\K[^<]+
^.+?e>\K[^<]+

The first line again uses a look behind and a look around with a lazy select all. The second example replaces the look around with a more defined character selector of negative <. The third replaces the look behind with a Keep reset character. finally just for fun i selected the first e> in the text with a lazy selector.

So I've now got all the values selected, just paste the current multi clipboard over the top and we have done.

Alternatives

Now there are many ways to skin a cat, i really wanted to play with regex today and so this was a nice exercise to practice with more advanced regex. But i know not everyone likes or gets regex, another approach would be to use the cursor. First select all lines with a data element (multi-select), go home and ctrl+right click till you have the cursor at just after the first quote. Then ctrl+shift+right to select the word, then copy them all. Press the down arrow to get focus to the value element, end and ctrl+left to get the cursor to just inside the end of the element value, ctrl+k, ctrl+space to set a mark, home and ctrl+left to get the cursor just inside the start of the element value, ctrl+k, ctrl+a to select to the mark and then paste. Or for this last step use the ctrl+shirt+a to use sublimes widen selection which works in xml documents.
There are so many variations on this cursor based approach that i cant list them all here, suffice to say sublime text is very powerful at text manipulation, learn your tooling people, and enjoy.

Want to see this in action watch this video:

Recorded on KRUT (an open source screen recorder) with keyjedi (to show the keyboard shortcuts).

If you think there is a simpler (or more clever :-) approach, please leave a comment and share. How bald is your Yak now?


Wednesday, 24 February 2016

From Monolith to Microservices - Moving from 7 to 70 services with C# and automation, automation, automation

In this post I want to give a feel of our journey from monolith to microservices, to highlight the automation that is required to do this from a vary high level (you wont see any code samples in this post).

But first a little history

We are moving from a monolith towards a microservice EDA (Event Driven Architecture). Its been over 2 years in the making but we are getting there (although we still have 2 hefty monoliths to break down and a few databases as integration points to eliminate). We are a Microsoft shop using VS2015, C# 6, .net 4.6, NServiceBus and plenty of azure services. I'm sure you have read elsewhere that you don't move towards micro-services without adequate automation, well we have invested heavily in this in the last few years and its now paying off big time. It feels like that at the moment we have a new service or API deployed to prod every other week.

There are many benefits to a microservice style of architecture of which I'm not going to list them all here, this post is more about automation and tooling in the .Net landscape.

The tools are not the job

Sometimes you get complaints that you are spending too long on DevOps related activities. That 'the tools are not the job' implying that you should be writing code not tooling around the code!! That you should be delivering features not spending so much time getting things into production!! This might have been true in the good old days when you had one monolith, maybe two, with shared databases. It used to be easy to build it on your dev machine and FTP the build up to 'the' live server (better yet MSDeploy or publish from visual studio). Then just hack around with the config files, click click your done. In those days you could count on Moore's law to help your servers scale up as you increased load, you accepted the costs of bigger and bigger servers if you needed them.

But in this modern era scaling out is the more efficient, and cheaper solution especially with the cloud. Small services enable you to scale the parts of the system that need it rather than the whole system. But this architectural style comes at a cost, imagine hand deploying your service to each node running your software (there could be tens)!
We have currently over 70 core services/apis and over 50 shared libraries (via nuget). we run operations in 2 different regions each with a different configuration, different challenges and opportunities. Each region has a prod and preprod environment. We have a CI system and a test environment which is currently being rebuilt.
So 'the tools are not the job', maybe true, we do write a lot of C#, but without the tooling we would be spending most of our time, building things, deploying things and fixing the things we push to prod in a incorrect state.

Technology, Pipelines and Automation

We are a Microsoft shop dealing mainly with C#6, .net 4.6, ASP.MVC, WebApi, SQL Server, nservicebus, Azure, Rackspace and powershell, lots of powershell.

We use GoCd from thoughtworks as out CI/CD tool of choice, and consequently have invested heavily in this tool chain over the last 2 years. Below is a 10,000 foot view of the 400+ pipelines on our Go server. (rapidly moving towards 500)
https://go.....com:8154/go/pipelines
Below is a rough breakdown of the different pipelines and their purposes.

01 Nuget packages - 50+

The nuget packages mainly contain message definitions used for communications between services on the service bus (MSMQ and Azure Queues). Generally each service that publishes messages has a nuget package containing the messages so that we can easily communicate between services with a shared definition.
We also have a number of shared utility packages, these are also nuget packages.
These pipelines not only build and test the packages but also publish to our internal nuget feed on manual request. The whole process of getting a new package into our nuget package manager is automated, very quick, painless.

These pipelines have 3 stages:
1. Build - After every commit we build 2 artifacts, the nuget package file and a test dll. These 2 artifacts are stored in the artifact repository.
2. Test -  Upon a green build the tests run.
3. Publish - If the tests are green we have a push button publish/deploy to our internal nuget feed.

02 Automation packaging - 3

This contains powershell that is needed to perform builds and deploys. These packages are deployed to a common location by Go (section 05) on to all the servers that contain agents.

03 Databases - 14 

We actually have over 20 databases but using CI to control the deployment of these is only just gaining traction, we are currently building this side up and so not all DBs are in the CI pipeline yet. Most are still deployed by hand using Redgate tooling, this makes it bearable but we are slowly moving to an automated DB deploy pipeline too. 

04 Builds - 70+

Every service/api has a build pipeline that is kicked off by a checkin to either SVN or Git. They mainly consist of 2 stages, Build and Test.
Build not only compiles the code into dlls but also transforms the configuration files and creates build and test artifacts along with version info files to enable us to easily see the version of the build when it is deployed on a running server.
Test usually consists of two stages, unit and integration. Unit tests don't access any external resources, databases, file systems or APIs. Integration tests do, simples.
Only pipelines with a green build and test stage are allowed to be promoted to test or preprod environments.

05 Automation deployment - 10+

These pipelines deploy the automation as defined in section 02 to all the go agents. Region one has 3 servers in prod and preprod so 6 agents. Region two has 2 in prod and preprod and a further 2 that contain services that are deployed to by hand (legacy systems still catching up). Then we have 2 build servers and some test servers also. All these services have dependencies automatically pushed to them when the automation (which is mainly powershell) changes. Al these pipelines are push button rather than fully automatic, we want to keep full control of the state of the automation of all the servers.

06 Region one prod - 40+

07 Region one preprod - 40+

08 Region two prod - 50+

09 Region two preprod - 60+

Each Region has 2 identical environments, one for preparing and testing the new software (preprod) and one that is running the actual system (prod). They are generally the same but for a small number of services that cant be run on preprod environments (generally gateways to other external 3rd parties) But also services that are still being built and not yet on prod.
These pipelines generally take one of 2 forms. APIs and services.
The deployment of said APIs and services is documented here: http://foldingair.blogspot.co.uk/2014/10/adventures-in-continuous-delivery-our.html

Since that blog article was written we have started to move more and more to Azure PAAS, primarily app services and web jobs with a scattering of VMs for good measure, but in general the pattern of deployment stands.

10 Region two test system - 60+

11 Region two test databases - 6

We are currently in the process of building up a new test system (services and databases) which is why the stages are not showing color as yet. In the end we will be deploying every piece of software every night and running full regression tests against it in the early morning to ensure everything still integrates together as it did yesterday.

Totals

432 pipelines and 809 stages (This was 4 weeks ago when i started to draft out this post, i know, i know, anyway it is now 460+ pipelines and 870+ stages). I wonder if this is a lot? but if you truly are doing microservices its inevitable! isn't it?

The value stream map of a typical service  

Typically one service or API will be associated with 6 pipelines and potentially pull dependencies from others via nuget or other related pipelines. A typical value stream map is shown below:

Typical Value Stream Map

The circle on the left is source control detailing the check in that triggered the build (first box on the left). Then the 5 deployment pipelines run off this, test is shown at the bottom pulling directly from the build artifact. The 4 above are the 2 regions, prod on the right pulling from preprod in the middle, that pulls the artifact from the build.

This gives us great audit functionality as you can easily see which builds were deployed to prod and when it happened. Most importantly you cant even deploy to prod unless you have first gone through preprod, you cant go to preprod if any tests failed, and the tests will only run if the code actually built.

Config

Config is also dealt with as part of the build and deploy process. Config transforms are run at the build stage and then the deployment selects the config to deploy based on a convention based approach meaning all deployments are the same (templated) and config is easy to set up in the solution itself.
Whilst dealing with config in this manner its not the perfect solution it has served us really well so far, giving us confidence that all the configs of all services are in source control. We are really strict with ourselves on the servers also, we do not edit live configs on the servers, we change them in source control and redeploy. This does have its issues esp if you are only tweaking one value and there are other code changes now on trunk, but we manage. And are currently actively exploring other options that will allow us to separate the code from the config but keep the strict audit-ability that we currently have.

Monitoring and alerts

Monitoring a monolith is relatively straight forward, its either up and working or its not! With many small services there are more places for problems to arise, but on the flip side the system is more resilient as a problem wont cause the whole to come tumbling down. For this reason monitoring and alerts are vitally important.

Monitoring and alerts are handled by a combination of things. NServiceBus Service pulse, splunk, go pipelines (cradiator), monitor us, raygun, custom powershell scripts, SQL monitoring, azure monitoring, rackspace monitoring and other server monitoring agents. All these feed notifications and alerts into slack channels that developers monitor on the laptops at work and on our phones out of hours. We don't actually get too many notifications flowing through as we have quite a stable platform given the number of moving parts, which enables us to actually pay attention to fail and be proactive.

Summary

With each region we are planning to bring on board there are ~ 120-150 pipelines that need to be created, luckily the templating in GoCd is very good, and you can easily clone pipelines from other regions. But managing all this is getting a bit much.

I just cant imagine even attempting doing this without something as powerful as GoCd.

So the tools are not the job! maybe, but they enable 'the' job, without this automation we would be in the modern equivalent of dll hell all over again.

Feedback

What are your thoughts reader? Does this sound sensible? Is there another way to manage this number of microservices across this number of servers/regions? Please sound off in the comments below.

Thursday, 21 January 2016

To stand or not to stand?

Over my working life I've sat on many chairs. As a consultant you have to make do with what the client provides, but since I've become permie, I've been campaigning for a standing desk. Recently I got my wish, well, in part.

I don't get on with the chairs at work. They are not cheap ones by any means but by the end of the day I can really feel my neck and shoulders suffering. I'm not sure what it is with these ones in particular, I just don't get on with them. My cheaper chair at home is far better. I've actually been using a kneeling chair for the last year or so and that has been great, it really forces you to sit up straight. Its a little tiring at first but you get used to it, and I don't get the bad shoulders or neck that I used to get on the chair.

But like I said I really wanted to try a standing desk.

I managed to persuade the people that matter to give it a go and so between us we knocked up an IKEA hack standing desk made of coffee table and shelving unit (low budget experiment ~£30).

We had to make it low enough to allow the shortest member of the team to use it but we can 'jack up' the shelf using books and boxes to get it higher for other team members.

I can use it all day but I find my feet and legs get sore mid afternoon and I really need a sit down. The desk is a long way from hydraulic and its a very big faff to put all the stuff down on the desk so I've sacrificed my second screen in a mirrored set up so i can stand and sit as i please. I tend to do the morning stood up then split the afternoon alternating sitting and standing as I feel the need. It's a compromise and since the screens are big I can work split screen rather than multi monitor without losing too much.



Next stop walking desk :-) I really want to do this to alleviate the stress on the knees and feet. I've never tried it but from what I hear from people it's good. I can't see work going for that, maybe a wobble board though!

Monday, 30 November 2015

Do we need to deploy clean up work if there is no additional functionality? i.e no (perceived) business benifit.

Background

We recently did some work to a number of services where all we were doing was removing some old functionality that is no longer used (obsolete code). Removing messages, handlers, classes, tests. I like deleting code, it makes things simpler, less logic to break, less places for bugs to hide. Once we finished, we wanted to get all of this released to prod ASAP. We tested all of the areas affected in a large system level test in the preprod environment, all of the services (and others that had not changed) working together.

But we had push back. 

Push back

The question was: Why do you need to release this clean-up in advance of any further work?
By doing so you are making this an active rather than passive deployment with associated extra risk and double the cost.
If you are removing unused code you can just deploy with the next addition/change to code because by testing that you are implicitly testing that absence of code. Even if the new code isn’t affected, then our deployment checklists cover that situation too – we have already double checked this removal/clean-up of code won’t have an impact on production

Rebuttal

I broke this down into a number of sections. A number of questions that I thought were being asked in the statement.

Question: Why do many releases instead of one, isn’t that more risky?
This is a question from the old skool of thought, releases are big bad things we need to do as few as possible.
Answer: I would say many small releases are inherently less risky.
If (very unlikely but) if something goes wrong it will be less clear what caused the issue. Was it new functionality, or the clean-up work that is the culprit?
If we release now, we know what to monitor over the coming days, and have to monitor less
If we don’t deploy all 8 things, someone else will (at some point and in some cases many months in the future). This poor soul will need to decide if the changes we did need testing, what the consequences of the changes are, and worry about if there are other dependent services to deploy.
Each service is push button, so deploy time is small.
Rollback is not hard if we need it with no business consequences at the moment. In future if we release with other functionality and the changes we have made break something we need to rollback new functionality too.

Question: If we are removing code why the need to release anything? There is no new functionality to release.
I guess this is a question about business value, no new business value no need to release.
Answer: That is true, its mostly removing old code and cleaning up. But its just as critical, almost more so to get this out in a small release sooner, as we may (again unlikely but possible, we are human) have removed something we should not have.

Question: Won't it be more to test doing it twice?
This assumes that we manually test everything on every release. where actually we only test what has changed manually in conjunction with automated testing for the rest.
Answer: We have already done good full end to end testing last week of the 8 things affected. If we wait until next week or the week after we will have to do the tests again as the versions of things to be released will all be different by then, so will need to test the 8 deployable things again full stack = extra 1 day

Other reasons for deployment.
The changes, what we did and why we did it, are still fresh in our minds. The longer we leave it the less sure we are that we will be doing the right things.
Its not critical, but I'd prefer not to do a partial deployment (service-X is going to get released soon) I'd like the rest of the clean-up to be deployed too.
Ideally prod and preprod are as same as possible (for environmental consistency and testing reasons). Any differences between the test system of preprod and prod invalidates other testing efforts. Because in prod services will be integrating with a different version of other services than that are in preprod making like-for-like testing impossible.

Conclusion

I maintain there is business benefit in doing the deployment now, and deploying all 8 services at that. To be fair businesses, managers, stakeholders, even developers (especially senior ones) have all seen their fair share of long deployments, failure and difficult rollbacks. Leading, ultimately to a fear of deployment. So it's natural to want to avoid the perceived risks. But perversely by restricting the number of deployments you are actually increasing the likelihood of future fail.
A core philosophy of the devops culture is to release early and often (continuous delivery). By doing the things you find painful more often you master them and make them trivial, there by improving your mean time to recovery.

The business benefit is ultimately one of developer productivity, testability and system up-time

Tuesday, 17 November 2015

Splunk alerts to slack using powershell on windows

We use Splunk to aggregate all the logs across all our services and APIs on many different machines. It gives us an invaluable way to report on the interactions of our customers through new business creation on the back-end servers running on NServiceBus, to the day to day client interactions on the websites and mobile apps.

We have been investing in more monitoring recently as the number of services (I hesitate to use the buzzword micro, but yes they are small) is increasing. At present pace I'd say there is a new service or API created almost each week. Keeping on top of all these services and ensuring smooth running is turning into a challenge, which splunk is helping us to meet. When you add service control, pulse and insight from particular (makers of NServiceBus) we have all bases covered.

We have recently added alerts to splunk to give us notifications in slack when we get errors.

The Setup

We are sending alerts from splunk to slack using batch scripts and powershell.

Splunk Alerts

First set up an alert in splunk, This splunk video tells you how to create an alert from a search results. We are using a custom script which uses arguments as documented here. Our script consists of 2 steps a bat file and a powershell file. The batch file calls the powershell passing on the arguments.

SplunkSlackAlert.bat script in C:\Program Files\Splunk\bin\scripts
@echo off
powershell "C:\Program` Files\Splunk\bin\scripts\SplunkSlackAlert.ps1 -ScriptName '%SPLUNK_ARG_0%' -NEvents '%SPLUNK_ARG_1%' -TriggerReason '%SPLUNK_ARG_5%' -BrowserUrl '%SPLUNK_ARG_6%' -ReportName '%SPLUNK_ARG_4%'"

SplunkSlackAlert.ps1 lives alongside
param (
   [string]$ScriptName = "No script specified",
   [string]$NEvents = 0,
   [string]$TriggerReason = "No reason specified",
   [string]$BrowserUrl = "https://localhost:8000/",
   [string]$ReportName = "No name of report specified"
)

$body = @{
   text = "Test for a parameterized script `"$ScriptName`" `r`n This script retuned $NEvents and was triggered because $TriggerReason `r`n The Url to Splunk is $BrowserUrl `r`n The Report Name is $ReportName"
}

#Invoke-RestMethod -Uri https://hooks.slack.com/services/AAAAAAAAA/BBBBBBBBB/CCCCCCCCC -Method Post -Body (ConvertTo-Json $body)

Slack Integration

You can see the call to the slack API in the invoke-restmethod, the slack documentation for using the incoming web hook is here. there is quite a rich amount of customization that can be performed in the json payload, have a play.

Before you can actually use this you must first setup slack integration as documented here which requires you to have a slack account.

The fruits of our labor:


All the script code is given in my gist here.

Credits:

Thanks to my pair Ruben for helping on this, good work.