Saturday, 6 October 2012

How to update/install Node on Ubuntu

I needed to upgrade node.js on my ubuntu dev machine but could not find any good instructions on the internet, i tried several suggestions and finally got it working using an amalgamation of a few blogs.
My current system setup before the process.

ubuntu64:~$ which node

/usr/local/bin/node

ubuntu64:~$ node -v

v0.4.9

Firstly make sure your system is upto date

ubuntu64:~$ sudo apt-get update

ubuntu64:~$ sudo apt-get install git-core curl build-essential 

             openssl libssl-dev

Then clone the Node.js repository at git hub:

ubuntu64:~$ git clone https://github.com/joyent/node.git

ubuntu64:~$ cd node

I wanted the latest tagged version

ubuntu64:~/node$ git tag

....

....big list of all the tags

....

ubuntu64:~/node$ git checkout v0.9.2

Then I removed the old version of node

ubuntu64:~$ which node

/usr/local/bin/node

ubuntu64:~$ cd /usr/local/bin

ubuntu64:/usr/local/bin$ sudo rm node

Now to install the desired version, in may case v0.9.2

ubuntu64:/usr/local/bin$ cd ~/node

ubuntu64:~/node$ ./configure

....

ubuntu64:~/node$ make

....

ubuntu64:~/node$ sudo make install

....

Then I had to run the following to update the profile
ubuntu64:~/node$ . ~/.profile
Finally confirm that node is in fact upgraded, and npm has magically been installed too :-) bonus

ubuntu64:~/node$ which node

/usr/local/bin/node

ubuntu64:~/node$ node -v

v0.9.2

ubuntu64:~/node$ which npm

/usr/local/bin/npm

ubuntu64:~/node$ npm -v

1.1.61

Wednesday, 22 August 2012

Continuous Integration performance testing. An easily customisable solution.

Using JMeter to profile the performance of your web application and visualise performance trends, all within the CI pipeline.

The full solution as outlined here can be found on my GitHub repository at https://github.com/DamianStanger/CIPerformance

Introduction

Most companies care about the performance of their web sites/web apps but often the testing of this performance is left till the last minute with the hope that the devs will have been doing good job writing well performant code for the last x months whilst developing. I don’t know why this is often the way? If performance really is a major Non Functional Requirements (NFR) then you have to test your performance as you go, you can’t leave this until the last moment just before deployment / live and then when you find that performance is not good enough just try and hack in quick fixes. This is just not good enough, you can’t just hack in performance after the fact, it can take a substantial change to the design (to do it well).

On our team we have been performance profiling each important page of our app since month 1 of the development process, we are now live and are working towards the 4th major release. I (and the team) and have found our continuous performance testing invaluable. Here is the Performance graph as it stood a few weeks ago:

The process outlined below is not a method for stress testing your app. It’s not designed to calculate the load that can be applied, instead it's used to see the trend in the performance of the app. Has a recent check-in caused the home page perform like a dog? Any N+1 DB selects or recursive functions causing trouble? It’s a method of getting quick feedback within the CI pipeline minutes after a change is checked in.

The process

1. When we check in, our CI box (TeamCity) runs the build (including javascript tests, unit tests, integration tests, functional tests, acceptance tests), if all this is successful then the performance tests are kicked off.
2. Teardown the DB and restore a new copy (so we always have the same data for every run, this DB has a decent amount of data in it simulating the data you have in live in terms of volume and content).
3. Kick the web apps to prepare them for the performance tests, this ensures IIS has started up, and the in memory caches primed.
4. Run the JMeter scripts.
a. There are numerous scripts which simulate load generated by different categories of user. For example a logged out user will have a different performance profile to a fully subscribed user.
b. We run all the scripts in serial as we want to see the performance profiles of each type of user on each different site we run.
5. The results from each run are processed by a powershell script which extracts the data from the JMeter log files (jtl) and writes the results into a sql server database (DB). There is one record per page per test run.
6. We have a custom MVC app that pulls this data from the DB (using dapper) and displays it to the team on a common monitor (using JSON and RGraph) that is always updating. We see instantly after we have checked in if we have affected performance, good or bad. We could break the build if we wanted but decided this was a step too far as sometimes it can be a day or two to fix any poorly performing aspect of the site.

A stripped down version is avaliable on my GitHub account, Running the powershell script a few times and then running the mvc app you should see something like the following:

The juicy bits (interesting bits of code and descriptions)

Powershell script (runTest.ps1)

• Calling out to JMeter from powershell on line 112
& $jmeter -n -t $test_plan -l $test_results -j $test_log

• Parse JMeter results on line 133

[System.Xml.XmlDocument] $results = new-object System.Xml.XmlDocument

$results.load($file)

$samples = $results.selectnodes("/testResults/httpSample | /testResults/sample/httpSample")

Then iterate all the samples and record all the page times and errors

• Write results to DB on line 171

$conn = New-Object System.Data.SqlClient.SqlConnection($connection_string)

$conn.Open()

foreach($pagestat in $page_statistics.GetEnumerator())

{

    $cmd = $conn.CreateCommand()

    $name = $pagestat.Name

    $stats = $pagestat.Value

    $cmd.CommandText = "INSERT Results VALUES ('$start_date_time', '$($name)',  

    $($stats.AverageTime()), $($stats.Max()), $($stats.Min()), $($stats.NumberOfHits()), 

    $($stats.NumberOfErrors()), $test_plan)"

    $cmd.ExecuteNonQuery()

}

JMeter scripts

You can look up JMeter yourself to find suitable examples of this. My project posted here just has a very simple demo script which hits Google and Bing over and over. You can replace this with any JMeter script you like. The DB and the web app are page and site agnostic so it should be easy to replace with your own, and it will pick up your data and just work.
I recommend testing all the critical pages in your app, but I find the graphs get too busy with more than 10 different lines (pages) on them. If you want to test more stuff just add more scripts and graphs rather than have loads of lines on one graph.
The generic solution given here has two scripts but you can actually have as many as you like.Two would be a good choice if you had a public facing site and an editor admin site which both have different performance profiles and pages. But in the end it's up to you to be creative in the use of your scripts and test what really needs testing.

The results DB

The DB is really simple. It consists of just one table which stores a record per page per test run. This DB needs creating before you run the script for the first time. The file Database.sql will create it for you in SQL server.

The MVC app

Data layer, Dapper

Using dapper (a micro ORM installed through nuget) to get the daily results is done in the resultsRepository class:



var sqlConnection = new SqlConnection("Data Source=(local); Initial Catalog=PerformanceResults; Integrated Security=SSPI");

sqlConnection.Open();

var enumerable = sqlConnection.Query(@"

SELECT Url, AVG(AverageTime) As AverageTime, CAST(RunDate as date) as RunDate FROM Results 

    WHERE TestPlan = @TestPlan

    GROUP BY CAST(RunDate as date), Url

    ORDER BY CAST(RunDate as date), Url", new { TestPlan = testPlan });

sqlConnection.Close();

return enumerable;

The view, JSON and RGraph

In this sample code there are four different graphs on the page, two for Google (test plan 1), and two for Bing (test plan 2). Heartbeat data shows a data point for every performance run. It shows you instantly if there has been a bad performance run. This shows all the runs over the last two weeks. The Daily Averages show a data point per day for all the performance data on the DB.
There are four canvases that contain the graphs, these graphs are all drawn using RGraph from some JSON data populated from the data pulled off the DB. It’s the javascript function configureGraph that does this work with RGraph, for details of how to use RGraph see the appendix.
The JSON data is created from the model using LINQ in the view as such:

dailyData: [@String.Join(",", Model.Daily.Select(x => "[" + String.Join(",", x.Results.Select(y => y.AverageTimeSeconds).ToList()) + "]"))],

This will create something like the following depending on the data in your DB:
dailyData: [[4.6,5.1],[1.9,2.2],[4.0,3.9],[9.0,9.0]],
Where the inner numbers are the data points of the individual lines. So the data above is four lines each with two data points each.
Customising things for your own purposes

Customisation

So you would like to customise this whole process for your own purposes? Here are the very simple steps:

Edit the function CreateAllTestDefiniotions in RunTest.ps1 to add in any JMeter scripts that you want to run as new TestPlanDefinitions.
Change or add to the JMeter scripts (.jmx) to exercise the sites and pages that you want to test.
Add the plan definitions to the method CreateAllPlanDefinitions of the class PlanDefinition in the performance stats solution. This is all you need to edit for the web interface to display all your test plans. The graphs will automatically pick up the page names that have been put into the configured JMeter scripts.
Optionally change the yMax of each graph so that you can more easily see the performance lines to a scale that suits your performance results.

Conclusion

We as a team have found this set up very useful. It has highlighted many issues to us including: n+1 select issues, combres configuration problems, and all number of issues with business logic usually with enumerations or recursive functions.
When set up so that the page refreshes every minute, it does a really good job. It has been a constant reminder to the team to make sure they are doing a good job with regard to the NFR which is performance.

A note on live performance/ stress testing

Live performance testing is a very different beast altogether, the objective of which is to see how the system as a whole reacts under stress: To determine the maximum number of page requests that can be served simultaneously. This is different to the CI performance tests outlined above. These tests run on a dev box and are only useful as a relative measure to see how page responsiveness is changing as new functionality is added.

Appendix

JMeter - https://jmeter.apache.org/
Dapper – Installed from Nuget
RGraph - http://www.rgraph.net/
GitHub - https://github.com/DamianStanger/CIPerformance
VS2012 - https://www.microsoft.com/visualstudio/11/en-us

Tuesday, 15 May 2012

A PowerShell script to count your lines of source code

We have been thinking about code quality and metrics of late and since im also learning more powershell decided to write a little script to do that for me. It basically finds all the code files in the project directories and counts lines and files:

Here it is:


$files = Get-ChildItem . -Recurse | `

    where-Object {$_.Name -match "^.+\.cs$"}

$processedfiles = @();

$totalLines = 0;

foreach ($x in $files)

{

    $name= $x.Name;

    $lines= (Get-Content ($x.Fullname) | `

        Measure-Object –Line ).Lines;

    $object = New-Object Object;

    $object | Add-Member -MemberType noteproperty `

        -name Name -value $name;

    $object | Add-Member -MemberType noteproperty `

        -name Lines -value $lines;

    $processedfiles += $object;

    $totalLines += $lines;

}

$processedfiles | Where-Object {$_.Lines -gt 100} | `

    sort-object -property Lines -Descending

Write-Host ... ... ... ... ...

Write-Host Total Lines $totalLines In Files $processedfiles.count

Line 00: Will get all the .cs files from the current working folder and below.
Line 07: Uses the measure-object cmdlet to get the number of lines in the current file being processed.
Line 09: Creates an object, lines 10 and 12 dynamically adds properties to that object for the file name and the line count.
Line 11: Adds the new object to the end of the array of processed files.
Line 17: Selects all the files from the array where the line count is greater than 100 (an arbitary amount, i only care about files longer than roughly 2 screens worth of text), Then print them out in descending order of line count.

My results:
our current project has a total of 154068 lines of code in .cs files.
2559 .cs files of which 312 files have a line count greater than 100 lines.
16 files are over 400 lines in length, but none of those were in the main product (All the worst classes are test classes and helpers which are not production code).

I also wondered about the state of my views:
320 .cshtml files a total of 10958 lines, the vast majority are less than 100 and only 6 over 150.

Tuesday, 17 January 2012

Linq performance problems with deferred execution causing multiple selects against the DB

We have some really good performance tests that run on every checkin providing
the team with an excelent view of how the performance of the software changes
due to different changes in the code base. We recently saw a drop in performance
and we tracked it down to a problem in our data layer.

The problem we encountered was within LINQ to SQL but will be a problem with
other types of LINQ if your not careful.

Personally i consider LINQtoSQL to be dangerous for a number of reasons and
would actually prefer not to be using it but we are where we are and we as a
team just need to be weary of LINQToSQL and its quirks.

This quirk is when the deferred execution of a linq to sql enumeration is
causing multiple selects against the DB.

As this code demonstrates.


public IList<IndustrySector> GetIndustrySectorsByArticleId(int articleId)

{

  var industrySectorsIds = GetIndustrySectorIds(articleId);

  return ByIds(industrySectorsIds);

}



private IEnumerable<int> GetIndustrySectorIds(int articleId)

{

  var articleIndustrySectorsDaos = databaseManager.DataContext.ArticleIndustrySectorDaos.Where(x => x.ArticleID == articleId);

  return articleIndustrySectorsDaos.Select(x => x.IndustrySectorID);

}



public IList<IndustrySector> ByIds(IEnumerable<int> industrySectorIds)

{

  return All().Where(i => industrySectorIds.Contains(i.Key)).Select(x => x.Value).ToList();

}





public IEnumerable<IndustrySector> All()

{

  //work out all the industry sectors valid for this user in the system, this doesn't make a DB call

}

So in the end this all causes an number of identical queries to be fired against the DB,
industrySectorsIds.count number of calls to the DB to be precise.
This is the select we were seeing:


exec sp_executesql N'SELECT [t0].[IndustrySectorID]

FROM [dbo].[tlnk_Article_IndustrySector] AS [t0]

WHERE [t0].[ArticleID] = @p0',N'@p0 int',@p0=107348

By forcing the ByIds() method to retreive all the ids from the DB before iterating All()
will mean that they are loaded into memory once only.


public IList<IndustrySector> ByIds(IEnumerable<int> industrySectorIds)

{

  var sectorIds = industrySectorIds.ToList();

  return All().Where(i => sectorIds.Contains(i.Key)).Select(x => x.Value).ToList();

}

now you only get one call to the DB, thanks LINQtoSQL, your great.