Wednesday, March 5, 2008

C# - String Concatenation

 

One way to format C# strings is to add strings together using the plus-sign operator, such as:

string fileStats =
    "Bytes=" + bytes.ToString() +
    ", Pages=" + pages.ToString() +
    ", Words=" + words.ToString();

This code will correctly produce the desired output but is inefficient and does not scale as well.
The problem is that in C#, strings are immutable. This means that once a string is created, its value cannot be changed. Instead, when you modify a string, a whole new string is created, and a reference to the new string is returned.
Strings are also interned, which means that if the same string is used in multiple places within an application, all references to that string will point to the same unique string instance in memory.
Back to the example. Since strings are immutable, the code above will create a new string for each addition in the expression, resulting in four extra and unnecessary strings every time the code executes:

"Bytes=2375″
"Bytes=2375, Pages="
"Bytes=2375, Pages=3″
"Bytes=2375, Pages=3, Words="
"Bytes=2375, Pages=3, Words=477″

The first four strings are essentially throw-away strings that will not likely be used again and hence waste memory space and garbage collection cycles.

The correct approach is to use the String.Format method, such as:

string fileStats = String.Format(
    "Bytes={0}, Pages={1}, Words={2}",
    bytes, pages, words );

If you are just concatenating strings and don't have any special formatting requirements, then String.Concat is even faster:

string fileStats = String.Concat(
    "Bytes=", bytes,
    ", Pages=", pages,
    ", Words=", words );

In our testing, String.Format can be 10-50% faster than string addition, and String.Concat can be 50-500% faster, depending on string length and number of iterations. Both options use less memory and have less impact on garbage collection than string addition.
So why are C# strings immutable? There is a lengthy discussion here and here. But in general, mutable strings would be a nightmare to manage in a multithreaded environment. Also, strings are stored as an array of characters, so modifying a string's length would require allocating a new character array anyway. Finally, mutable strings could pose a security risk by allowing software to modify database and system connection strings on the fly.

Source

3 comments:

  1. I'm not sure if they changed something in the CLR to handle this but your article isn't true any more.

    public class Class1
    {
    private Class1()
    {
    var bytes = new byte[] {};
    int pages = 0;
    string words = "blalskad";
    string fileStats = "Bytes=" + bytes + ", Pages=" + pages + ", Words=" + words;
    fileStats = String.Concat("Bytes=", bytes, ", Pages=", pages, ", Words=", words);
    }
    }

    If you open it in reflector or view the direct IL code output you will see it's exactly the same

    string fileStats = string.Concat(new object[] { "Bytes=", bytes, ", Pages=", pages, ", Words=", words });
    fileStats = string.Concat(new object[] { "Bytes=", bytes, ", Pages=", pages, ", Words=", words });

    ReplyDelete
  2. @dotnetchris thanks for your comment
    Are you working with 2.0 or 3.5?

    ReplyDelete
  3. I agree with "dotnetchris" . This blog doesn't make much sense.
    For concatenating many strings(say more than 10), you should use StringBuilder class.

    See the following url:
    http://www.robinthomas.in/dotnet/stringbuilder-vs-string-concatenations/

    ReplyDelete