Wednesday 28 February 2018

Converting CRLF to LF for all files in a git repository

At work we currently have people who do their dev on linux laptops, linux VMs, windows and the WSL which means that we need to be careful about the compatibility of files in git. We deploy to centos in all our production and pre-prod environments so we always check-in linux line endings.

But recently when I was looking through some codez I found a bit of a mix of files with LF and CRLF line endings and so wanted make them all consistent with the LF linux standard.

git config


I don’t know how this happened exactly and didn’t narrow it down to any one commit but just wanted it fixed. We should all have our git clients set to convert and check in linux line endings which you can check with the command:

git config --list --global
git config --list


You are looking for the setting core.autocrlf. This can take three values: true, false and input. Depending on the OS that you are using you need to ensure that you use the correct setting.

On windows it should be set to true. That is check out windows style (CRLF) but check in linux style (LF).

On linux it needs to set to false or input as you don’t want files to contain windows line endings during development so you chekout with LF. You can also leave as default which is false.

I make heavy use of WSL (windows subsystem for linux) as well as centos VMs running on virtualbox. WSL behaves like linux so I have the default set which is not to change the line endings going in or out. But you do have to be careful. If you change the files or create files using a windows editor (I use webstorm and sublime) then you could inadvertently check in windows line endings, so it might be best to use input. Input will checkout as is from the repo but on check in will convert all line endings to LF, just in case a CRLF file was introduced.

By the way I love the WSL I use it every day and do prefer it to using a VM running linux, it works great for node dev.

Converting CRLF to LF


Anyway back to the main point of this post. We have some files with windows line endings mixed in with files with linux line endings. How to make them consistent? In particular how to make them all have linux line endings?

The difference is \r\n (windows) vs \n (linux) the tool sed is very good at finding strings in a file and replacing them with another

sed is a stream editor for filtering and transforming text it takes we can make it take a regex replacement and run that against a file to remove any carriage returns from it '\r'

sed -i 's/\r//g' myfilename.js

-i tells sed to do an in place substitution, 's/\r//g' is a regex that searches for carriage return '\r' and replaces them with nothing '//' globally for that file.

But we have hundreds of files across tens of nested directories. So we need to find all the files we want to 'fix' using the find command.

find . -type f -not -path './.git*' -not -path './node_modules*'

This will recursively list all files from the current directory excluding any files in the .git or node_modules folders. Do remember to exclude your .git folders as you will corrupt it if you run the substitution against files in there. Also remove any package folders or binary folders, this depends on the environment you are working in, I'm currently doing node dev so excluding the node_modules is good enough for me.

All that remains is to put them together using the standard unix pipe operator and the xargs command which allows you to build and execute command lines, it will take the output of the find space separate the file names and append them to the next command, we would use it thus:

find . -type f -not -path './.git*' -not -path './node_modules*' | xargs sed -i 's/\r//g'

If the folder contained 2 files xargs would build a command that looked like this:

sed -i 's/\r//g' ./file1.js ./file2.js

Voila!

All CRLF line endings are replaced with LF. You should be able to check this by using git diff to see the changes. You should see all line endings in the unified diff like this:

diff --git a/file1.js b/file1.js
index 01ce825..f5f8e58 100644
--- a/file1.js
+++ b/file1.js
-old line with windows line endings^M
+old line with windows line endings


If you don’t see the ^M but just two lines that look the same then there are a couple of tricks you can try.
git diff -R This reverses the output, apparently git does not always highlight removed white space, but will highlight added white space.
git diff | cat -v This will pipe the raw patch output from the git diff to cat. cat with a -v echoes the input including all non-display characters (like a carriage return) to the console.

Appendix

https://git-scm.com/docs/git-config
https://git-scm.com/docs/git-diff
https://manpages.debian.org/stretch/sed/sed.1.en.html
https://manpages.debian.org/stretch/findutils/xargs.1.en.html
https://manpages.debian.org/stretch/findutils/find.1.en.html