This website uses cookies

This website uses cookies to give you the best and most relevant experience. By continuing to browse this site, you are agreeing to our use of cookies. Learn More.

Removing Text Qualifiers from Numbers in Delimited Files or Strings

Importing delimited data as a bulk operation in Sql Server often requires the use of a format file, especially when the conversion of numeric strings into numeric data types gives rise to errors due to the presence of text qualifiers such as double-quotes. Creating a format file can be a tedious task if the number of columns is high. The approach described in this article looks at the removal of text qualifiers from delimited data and works well if the numeric cells have a non-empty value.

General import options

The Replace method of the Regex class can be used to match positive, negative or decimal numbers and modify any hit with something more appropriate via a delegate. The regular expression and the delegate can be modified to suit any specific needs. In the example below, the double quotes from numeric values is removed. The regular expression used can match positive, negative, integers or decimal numbers.

Removing double quotes from numeric values
var original = "1, \"ON\", \"204900\", \"43.710773500000002\", \"-79.726135299999996\", \"2013-02-01 00:00:00\"";
var modified = Regex.Replace(
                    original, //original content
                    @"(""-?\d*\.?\d+"")", //regular expression to match positive, negative, integers, decimals
                    delegate (Match m) { return m.Value.Replace("\"", ""); }, //replace quotes with empty string
                    RegexOptions.None
                );
Console.WriteLine($"orginal: {original}\nmodified: {modified}");

//output
//orginal: "1", "ON", "204900", "43.710773500000002", "-79.726135299999996", "2013-02-01 00:00:00"
//modified: 1, "ON", 204900, 43.710773500000002, -79.726135299999996, "2013-02-01 00:00:00"

A similar approach can be used in case of larger files with the help of StreamReader and StreamWriter instances.

Removing text qualifiers from large files
using (StreamReader reader = new StreamReader(@"Test.txt", System.Text.Encoding.GetEncoding("ISO-8859-1")))
using (StreamWriter writer = new StreamWriter(@"Test_Modified.txt", false, System.Text.Encoding.GetEncoding("ISO-8859-1")))
{
     writer.Write(Regex.Replace(
          reader.ReadToEnd(),  //original content
          @"(""-?\d*\.?\d+"")", //regular expression to match positive, negative, integers, decimals
          delegate (Match m) { return m.Value.Replace("\"", ""); }, //replace quotes with empty string
          RegexOptions.Multiline
     ));
}

A test using a 24MB file containing 250000 records as in the example above (see first code snippet original vs modified) completed in 2.5 seconds on my development machine (quad core i5 2.00GHz, 16GB RAM).