Tuesday, 17 January 2012

Linq performance problems with deferred execution causing multiple selects against the DB

We have some really good performance tests that run on every checkin providing
the team with an excelent view of how the performance of the software changes
due to different changes in the code base. We recently saw a drop in performance
and we tracked it down to a problem in our data layer.

The problem we encountered was within LINQ to SQL but will be a problem with
other types of LINQ if your not careful.

Personally i consider LINQtoSQL to be dangerous for a number of reasons and
would actually prefer not to be using it but we are where we are and we as a
team just need to be weary of LINQToSQL and its quirks.

This quirk is when the deferred execution of a linq to sql enumeration is
causing multiple selects against the DB.

As this code demonstrates.

public IList<IndustrySector> GetIndustrySectorsByArticleId(int articleId)
  var industrySectorsIds = GetIndustrySectorIds(articleId);
  return ByIds(industrySectorsIds);

private IEnumerable<int> GetIndustrySectorIds(int articleId)
  var articleIndustrySectorsDaos = databaseManager.DataContext.ArticleIndustrySectorDaos.Where(x => x.ArticleID == articleId);
  return articleIndustrySectorsDaos.Select(x => x.IndustrySectorID);

public IList<IndustrySector> ByIds(IEnumerable<int> industrySectorIds)
  return All().Where(i => industrySectorIds.Contains(i.Key)).Select(x => x.Value).ToList();

public IEnumerable<IndustrySector> All()
  //work out all the industry sectors valid for this user in the system, this doesn't make a DB call

So in the end this all causes an number of identical queries to be fired against the DB,
industrySectorsIds.count number of calls to the DB to be precise.
This is the select we were seeing:

exec sp_executesql N'SELECT [t0].[IndustrySectorID]
FROM [dbo].[tlnk_Article_IndustrySector] AS [t0]
WHERE [t0].[ArticleID] = @p0',N'@p0 int',@p0=107348

By forcing the ByIds() method to retreive all the ids from the DB before iterating All()
will mean that they are loaded into memory once only.

public IList<IndustrySector> ByIds(IEnumerable<int> industrySectorIds)
  var sectorIds = industrySectorIds.ToList();
  return All().Where(i => sectorIds.Contains(i.Key)).Select(x => x.Value).ToList();

now you only get one call to the DB, thanks LINQtoSQL, your great.