Handling CSV Files in .NET Core C#

Posted by

Often times for business applications, you need to provide the ability for a user to upload a spreadsheet of data, or handle a data export. Often, this data is in the form of a CSV (comma-separated value) file. When you are using an ORM like Entity Framework, it’s helpful to parse that data into a List<T> (where T is the destination type) before acting on the data. By approaching the problem as a conversion to List, you can leverage the same code for any type.

After some work, I created a static extension method off of Stream that allows you to do the conversion in a one liner:

List<User> list = stream.CsvToList<User>();

How does this happen? Well, the data flow within the extension method is as follows:

  1. Use a Stream Reader to get the content into a string variable from the stream
  2. Split the string by ‘\r’ and ‘\n’ to get the individual lines
  3. Build up a map for property to column
  4. Iterate over each row, create a new type T for that row
  5. Split the row by comma, and iterate over results (IE, the columns)
  6. Type convert the column to a property
  7. Set the property to the property on the previously mentioned type T
  8. After converting and setting each of the columns in a row to T, add T to the return list

Easy right? Here’s the code implementation:

public static List<T> CsvToList<T>(this Stream stream, 
    Dictionary<string, string> map = null, int startRowOffset = 0, int startColumnOffset = 0, int endRowOffset = 0, char delimeter = ',') where T : new()
{
    //DateTime Conversion
    var convertDateTime = new Func<double, DateTime>(csvDate =>
    {
        if (csvDate < 1)
            throw new ArgumentException("CSV dates cannot be smaller than 0.");
        var dateOfReference = new DateTime(1900, 1, 1);
        if (csvDate > 60d)
            csvDate = csvDate - 2;
        else
            csvDate = csvDate - 1;
        return dateOfReference.AddDays(csvDate);
    });

    //Leverage StreamReader
    using (var sr = new StreamReader(stream))
    {
        var data = sr.ReadToEnd();
        var lines = data.Split(new char[] { '\n', '\r' }, StringSplitOptions.RemoveEmptyEntries).Skip(startRowOffset);
        var props = typeof(T).GetProperties()
        .Select(prop =>
        {
            var displayAttribute = (DisplayAttribute)prop.GetCustomAttributes(typeof(DisplayAttribute), false).FirstOrDefault();
            return new
            {
                Name = prop.Name,
                DisplayName = displayAttribute?.Name ?? prop.Name,
                Order = displayAttribute == null || !displayAttribute.GetOrder().HasValue ? 999 : displayAttribute.Order,
                PropertyInfo = prop,
                PropertyType = prop.PropertyType,
                HasDisplayName = displayAttribute != null
            };
        })
        .Where(prop => !string.IsNullOrWhiteSpace(prop.DisplayName))
        .ToList();

        var retList = new List<T>();
        var columns = new List<CsvMap>();
        var startCol = startColumnOffset;
        var startRow = startRowOffset;
        var headerRow = lines.ElementAt(startRow).Split(delimeter);
        var endCol = headerRow.Length;
        var endRow = lines.Count();

        // Assume first row has column names
        for (int col = startCol; col < endCol; col++)
        {
            var cellValue = (lines.ElementAt(startRow).Split(delimeter)[col] ?? string.Empty).ToString().Trim();
            if (!string.IsNullOrWhiteSpace(cellValue))
            {
                columns.Add(new CsvMap()
                {
                    Name = cellValue,
                    MappedTo = map == null || map.Count == 0 ?
                        cellValue :
                        map.ContainsKey(cellValue) ? 
                            map[cellValue] : string.Empty,
                    Index = col
                });
            }
        }

        // Now iterate over all the rows
        for (int rowIndex = startRow + 1; rowIndex < endRow; rowIndex++)
        {
            var item = new T();
            columns.ForEach(column =>
            {
                var value = lines.ElementAt(rowIndex).Split(delimeter)[column.Index];
                var valueStr = value == null ? string.Empty : value.ToString().Trim();
    
                var prop = string.IsNullOrWhiteSpace(column.MappedTo) ?
                    null :
                    props.FirstOrDefault(p => p.Name.Trim().Contains(column.MappedTo));
                
                // Handle mapping by DisplayName
                if (prop == null && !string.IsNullOrWhiteSpace(column.MappedTo))
                {
                    prop = props.FirstOrDefault(p => p.HasDisplayName && p.DisplayName.Trim().Contains(column.MappedTo));
                }

                // Do type conversion
                if (prop != null)
                {
                    var propertyType = prop.PropertyType;
                    object parsedValue = null;
                    if (propertyType == typeof(int?) || propertyType == typeof(int))
                    {
                        int val;
                        if (!int.TryParse(valueStr, out val))
                        {
                            val = default(int);
                        }
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(short?) || propertyType == typeof(short))
                    {
                        short val;
                        if (!short.TryParse(valueStr, out val))
                            val = default(short);
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(long?) || propertyType == typeof(long))
                    {
                        long val;
                        if (!long.TryParse(valueStr, out val))
                            val = default(long);
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(decimal?) || propertyType == typeof(decimal))
                    {
                        decimal val;
                        if (!decimal.TryParse(valueStr, out val))
                            val = default(decimal);
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(double?) || propertyType == typeof(double))
                    {
                        double val;
                        if (!double.TryParse(valueStr, out val))
                            val = default(double);
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(DateTime?) || propertyType == typeof(DateTime))
                    {
                        if (value is DateTime)
                        {
                            parsedValue = value;
                        }
                        else
                        {
                            try
                            {
                                DateTime output;
                                if (DateTime.TryParse(value, out output))
                                {
                                    parsedValue = output;
                                }
                                else
                                {
                                    parsedValue = convertDateTime(Double.Parse(value));
                                }
                            }
                            catch
                            {
                                if (propertyType == typeof(DateTime))
                                {
                                    parsedValue = DateTime.MinValue;
                                }
                            }
                        }
                    }
                    else if (propertyType.IsEnum)
                    {
                        try
                        {
                            parsedValue = Enum.ToObject(propertyType, int.Parse(valueStr));
                        }
                        catch
                        {
                            parsedValue = Enum.ToObject(propertyType, 0);
                        }
                    }
                    else if (propertyType == typeof(string))
                    {
                        parsedValue = valueStr;
                    }
                    else
                    {
                        try
                        {
                            parsedValue = Convert.ChangeType(value, propertyType);
                        }
                        catch
                        {
                            parsedValue = valueStr;
                        }
                    }
                    try
                    {
                        prop.PropertyInfo.SetValue(item, parsedValue);
                    }
                    catch (Exception ex)
                    {
                        // Indicate parsing error on row?
                    }
                }
            });
            retList.Add(item);
        }
        return retList;
    }
}

It looks like a lot, but most of the code is for handling the type conversion, and if there are any sorts of offsets by row or column.

It’s worth noting that you do need the column ordering defined on the class. So, an example user would look like this:

public class User
{
    [Display(Name = "ID", Order = 1)]
    public string Id { get; set; }

    [Display(Name = "FirstName", Order = 2)]
    public string FirstName { get; set; }

    [Display(Name = "LastName", Order = 3)]
    public string LastName { get; set; }

    [Display(Name = "PhoneNumber", Order = 4)]
    public string PhoneNumber { get; set; }

    [Display(Name = "BDay", Order = 5)]
    public DateTime Birthday { get; set; }
}

I have created a GitHub repository that contains this code, along with examples, located here: https://github.com/mvalenta/CsvParserExamples. In this repository, there are two examples, one for a web version where you would upload a csv file (with a sample file.csv that can be uploaded), and a console version that’s based off of a generic stream built from a string. Each example is it’s own .NET Core project, and should help with usage.

One comment

  1. Nice attempt, but it doesn’t handle all CSV files correctly. If you take a look at the RFC standard for CSV files, you’ll quickly realise why. Here are some pointers:

    – CSV may or may not have a header row.

    – Any field can be contained in quotes.

    – Line breaks within a field are allowed as long as they are wrapped in quotation marks. So you cannot simply split by line break like you did without taking this into consideration.

    – Commas within a field are allowed as long as they are wrapped in quotation marks. So you cannot simply split by comma like you did without taking this into consideration.

    – Quote marks within a field are escaped by doubling them. If you don’t handle this, you end up with extra quotes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.