DEV Community

bristolsamo
bristolsamo

Posted on

C# csv parser (Step by Step Tutorial)

It's nearly a proper passage for a junior developer to cludge collectively their own CSV parser the use of an easy string.split(','), and then sooner or later, a training session that there is a little bit more to this whole CSV thing than simply setting apart out values by way of a comma. In truth, CSVs have many nuances around how line breaks are handled within fields or how areas can be contained inside costs that destroy an absolute string cut-up method.

What is CSV?

CSVs (Comma Separated Values) are famous import and export data formats used in spreadsheets and databases. Commonly, information is saved on one line and separated using commas. It's miles important to apply our CSV help package deal for our tasks. CSV is an easy statistics layout, but there can be many differences. Those may additionally consist of different delimiters, new strains, or rates. It's miles feasible to read and Write CSV statistics with the assistance of the CSV assist library.

CSV Gotchas

More than one CSV problem must be brought up before we dive deeper. With a bit of luck, they must explain why rolling your personal is extra ache from time to time than it's well worth.

  • A CSV might also or may not have a header row. If there's a header row, then the order of the columns isn't always crucial because you may hit upon what is in reality in each column. If there may be no header row, then you depend on the order of the columns being identical. Any CSV parser must be able to use each study column-based totally on a "header" price and index.

  • Any field may be contained in rates. But areas that incorporate a line-damage, commas, or quotation marks should be included in fees.

  • To emphasize the above, line breaks within a discipline are allowed within a CSV so long as they're wrapped in quotation marks; this is what trips most people up who are studying line by line adore. It's a daily textual content report.

  • The Quote marks within a field are notated by double quote marks (As hostile to mention an escape character like a back curb).

  • All rows must have the same number of columns, but inside the RFC labeled as a "have to" and no longer a "have to."

  • At the same time, assure, the C in CSV stands for a comma. Ideally, a CSV parser can also deal with TSV (that is, the usage of tabs in preference to commas).

This is just for parsing CSV files into primitives, but in something like .internet, we will additionally be wanting :

  • Deserializing right into a list of objects

  • handling of Enum values

  • custom mappings (So the header fee can also or won't fit the name of the elegance belongings in C#)

  • Mapping of nested items

what is a CSV file parser?

The CSV parser elegance is an abstract class that helps to parse a record (or stream) of comma-separated values; In summary, it should be inherited by way of a programmer-developed elegance, which ought to, at a time minimum, put into force the summary techniques. Most of the strategies inside the CSVParsermagnificence are virtual, permitting the programmer to override their capability with new or supplementary processing.

The code takes a comma-delimited textual content report (or flow) and parses every line into discrete data fields.

CSVs have an abundance of troubles with how line breaks are treated in fields or how fields can be contained in prices that block a simple string split method. I've recently discovered the following alternatives while converting CSVs to C#. Net: it is the first time I've ever used simply one string.Split(') instead of a simple. Strings.cutup() to split the values in a comma. In this text, we can look at the pleasant methods in which C++ has a CSV parsing feature in C.XcPC.net.

considerations for processing statistics

subsequent need to continually be considered while uploading CSV documents.

  • All columns are suspected to be missing altogether or missing in one or greater rows.

  • Mixed information kinds, don't forget a column with dates in which some rows may additionally have malformed dates, dates set up for a great tradition, columns that should be numeric had been a few rows have no cost or sudden format and so on.

  • Columns that have values that aren't legitimate to your business, e.g., a listing of merchandise that needs to map to a product desk in which there are products that you don't manage. However, incoming information has values 1 thru a hundred.

  • The file is in use using another system and is locked.

  • The document is extraordinarily huge, and processing time can also take hours, have a plan to run a nightly activity.

  • Dealing with rows/columns that don't healthy in the database, have a plan to deal with them.

  • Presenting clients a method(s) to study suspect data, alter or reject the information.

  • Do not forget an intermediate database desk to process suspect information that can be executed over the years. There may be a massive information set that may take hours or days to process.

Don't forget running with CSV files is a puzzle no matter what the structure must be and that unique parsing documents commonly have their very own quirks.

Parse CSV Files With the TextFieldParser Class in C#

to use the TextFieldParser magnificence, we have to reference the Microsoft.VisualBasic.dll in our C# code. The TextFieldParser elegance consists of many methods for parsing dependent textual content documents in C#.

We can study a CSV file with the TextFieldParser class with the aid of placing the delimiters with the SetDelimiters() function inside the TextFieldParser magnificence.

The below code example shows us how to parse data from a CSV file with the TextFieldParser magnificence in C#.



using System;
using Microsoft.VisualBasic.FileIO;

namespace parse_csv
{
    class Program
    {
        static void Main(string[] args)
        {
            using (TextFieldParser textFieldParser = new TextFieldParser(@"C:\File\Sheet1.csv"))
            {
                textFieldParser.TextFieldType = FieldType.Delimited;
                textFieldParser.SetDelimiters(",");
                while (!textFieldParser.EndOfData)
                {
                    string[] rows = textFieldParser.ReadFields();

                }
            }
        }
    }
}


Enter fullscreen mode Exit fullscreen mode

Inside the above code, we initialized the instance textFieldParser of the TextFieldParser class by specifying the path to our CSV document within the constructor.

We then set our text area type to be delimited with the textFieldParser.TextFieldType = FieldType.Delimited and set, as the delimiter with textFieldParser.SetDelimiter(',') feature.

We then used a while loop to examine the CSV file to give up with the textFieldParser.EndofData. We stored the facts internal an array of strings with the ReadFields() feature.

*Parse data from a CSV File With the FileHelpers parser Library in C#
*

In C#, we have a report parser parsing the record based on its contents. The TextFieldParser is described in Microsoft.VisualBasic.FileIO library. Earlier than executing the program underneath, don't neglect to feature a reference to Microsoft.VisualBasic.

The FileHelpers library examines and writes records to files, streams, and strings in C#. It's far from a 3rd-party library and does not come pre-hooked with the .internet framework. We can easily install it by looking at it inside the NuGet bundle supervisor inside the visible Studio IDE.

We can use the FileHelpersEngine class to parse information from a CSV file in C#. The FileHelperEngine class gets the records from the document into magnificence objects in C#.

So, we must first create a version of elegance that may preserve our statistics from the record. The grace would include fields that represent columns within the CSV record. We will use the [DelimitedRecord(",")] to write that the "," is used as a delimiter here.

ReadFile(path) function can also be used to read facts interior and an array of sophisticated objects from the document inside the exact route. The subsequent code instance suggests how to parse a CSV file with the FileHelpers library in C#.



using FileHelpers;
using System;
namespace parse_csv
{
    [DelimitedRecord(",")]
    public class Record
    {
        public string Name;

        public string Age;
    }
    class Program
    {
        static void Main(string[] args)
        {
            var fileHelperEngine = new FileHelperEngine<Record>();
            var records = fileHelperEngine.ReadFile(@"C:\File\records.csv");

            foreach (var record in records)
            {
                Console.WriteLine(record.Name);
                Console.WriteLine(record.Age);
            }
        }
    }
}


Enter fullscreen mode Exit fullscreen mode

The above code reads the document and keeps it in an array of objects of the file elegance with the FileHelpers library in C#.

Parsing data using StreamReader 

In C#, StreamReader magnificence is used to cope with the documents. It opens, reads, and helps act different features to specific styles of documents. We can also perform particular operations on CSV files while using this class.

Parsing CSV files

First, check to make sure the report to parse exists. The following code blocks mHasException and mLastException are from a base exception magnificence that the class for parsing inherits. The go-back kind is a ValueTuple (hooked up using NuGet package manager).

for example:



if (!File.Exists(_inputFileName))
{
    mHasException = true;
    mLastException = new FileNotFoundException($"Missing {_inputFileName}");
    return (mHasException, new List<DataItem>(),new List<DataItemInvalid>() );
}


Enter fullscreen mode Exit fullscreen mode

If the document exists the next step is to set up numerous variables to be able to be used for validation purposes and return sorts in order to include valid and if offered invalid facts whilst studying in records from the CSV record.



var validRows = new List<DataItem>();
var invalidRows = new List<DataItemInvalid>();
var validateBad = 0;

int index = 0;

int district = 0;
int grid = 0;
int nCode = 0;
float latitude = 0;
float longitude = 0;


Enter fullscreen mode Exit fullscreen mode

the subsequent code block follows the code block above.

A while declaration is used to loop thru each line inside the CSV file. For every line, break up the road through a comma. In this case, that's the most commonplace delimiter. Subsequently, validate there are nine factors in the string array. If there aren't nine elements in the array, locate them into a possible reject container.

Be aware that the primary line contains skipped column names using checking the index/line variety stored inside the variable index.

Following the check for nine factors in a line of seven factors within the string, the array is checked to make sure they can be converted to the anticipated statistics kind starting from date to numerics and empty string values.

Passing the kind test above the section beneath the comment Questionable fields will do numerous more excellent exams, e.g., does the NICIC field incorporate data that isn't always in an anticipated variety.

Note all facts ought to be checked here consisting of the statistics in part[3] as this may be subjective to the points in other factors inside the array, so this is left to the overview technique so that it will give a grid with a dropdown of validating alternatives to pick from.

If there are troubles to review a report, the property is ready to flag the facts for a manual assessment, and a list is created.



try
{
    using (var readFile = new StreamReader(_inputFileName))
    {
        string line;
        string[] parts;

        while ((line = readFile.ReadLine()) != null)
        {
            parts = line.Split(',');
            index += 1;

            if (parts == null)
            {
                break;
            }

            index += 1;
            validateBad = 0;

            if (parts.Length != 9)
            {
                invalidRows.Add(new DataItemInvalid() { Row = index, Line = string.Join(",", parts) });
                continue;

            }

            // Skip first row which in this case is a header with column names
            if (index <= 1) continue;
            /*
             * These columns are checked for proper types
             */
            var validRow = DateTime.TryParse(parts[0], out var d) &&
                           float.TryParse(parts[7].Trim(), out latitude) &&
                           float.TryParse(parts[8].Trim(), out longitude) &&
                           int.TryParse(parts[2], out district) &&
                           int.TryParse(parts[4], out grid) &&
                           !string.IsNullOrWhiteSpace(parts[5]) &&
                           int.TryParse(parts[6], out nCode);

            /*
             * Questionable fields
             */
            if (string.IsNullOrWhiteSpace(parts[1]))
            {
                validateBad += 1;
            }
            if (string.IsNullOrWhiteSpace(parts[3]))
            {
                validateBad += 1;
            }

            // NICI code must be 909 or greater
            if (nCode < 909)
            {
                validateBad += 1;
            }

            if (validRow)
            {

                validRows.Add(new DataItem()
                {
                    Id = index,
                    Date = d,
                    Address = parts[1],
                    District = district,
                    Beat = parts[3],
                    Grid = grid,
                    Description = parts[5],
                    NcicCode = nCode,
                    Latitude = latitude,
                    Longitude = longitude,
                    Inspect = validateBad > 0
                });

            }
            else
            {
                // fields to review in specific rows
                invalidRows.Add(new DataItemInvalid() { Row = index, Line = string.Join(",", parts) });
            }
        }
    }
}
catch (Exception ex)
{
    mHasException = true;
    mLastException = ex;
}


Enter fullscreen mode Exit fullscreen mode

AS soon as the above example has finished, the subsequent line of code will write facts to the calling form/window that's a ValueTupler.



return (IsSuccessFul, validRows, invalidRows);


Enter fullscreen mode Exit fullscreen mode

Parsing data using OleDb

This approach "reads" traces from CSV files with the drawback of all fields aren't typed and carry extra luggage than wished for processing lines from the CSV file an excellent way to make a difference in time to the manner with large CSV files like in the example below.



public DataTable LoadCsvFileOleDb()
{
    var connString = $@"Provider=Microsoft.Jet.OleDb.4.0;.....";

    var dt = new DataTable();

    try
    {
        using (var cn = new OleDbConnection(connString))
        {
            cn.Open();

            var selectStatement = "SELECT * FROM [" + Path.GetFileName(_inputFileName) + "]";

            using (var adapter = new OleDbDataAdapter(selectStatement, cn))
            {
                var ds = new DataSet("Demo");
                adapter.Fill(ds);
                ds.Tables[0].TableName = Path.GetFileNameWithoutExtension(_inputFileName);
                dt = ds.Tables[0];
            }
        }
    }
    catch (Exception ex)
    {
        mHasException = true;
        mLastException = ex;
    }

    return dt;
}


Enter fullscreen mode Exit fullscreen mode

Parsing CSV files in C# using IronXL

IronXL is a .internet Library development introduced to you with the aid of Iron software. This library affords first-rate features and APIs to help us examine, create and replace/edit our excel files and spreadsheets. IronXL does now not require Excel to be established in your server or Interop. Furthermore, IronXL affords a faster and more intuitive API than Microsoft workplace Interop Excel.

With IronXL, it is pretty easy to parse data from CSV files; it is easy to create a CSV parser. With the most precise two lines of code, you could load a CSV file and convert it to Excel.

Adding the IronXL Nuget Package

earlier than you can employ IronXL to examine CSV files in MVC or ASP or .net center; you need to install it. Here is a brief stroll-via.

In Visual Studio, pick out the undertaking menu

manipulate NuGet packages

search for IronXL.Excel

deploy

when you need to read CSV documents in C#, IronXL is the best tool. You could study a CSV file with commas, or every other delimiter, as seen in the code segments underneath.



 WorkBook workbook = WorkBook.LoadCSV("Weather.csv", fileFormat: ExcelFileFormat.XLSX, ListDelimiter: ",");
    WorkSheet ws = workbook.DefaultWorkSheet;
    workbook.SaveAs("Csv_To_Excel.xlsx");


Enter fullscreen mode Exit fullscreen mode

1. Create a New Project

After you have got set up IronXL, create a new task and add the IronXL namespace

using IronXL;

2. Load a CSV File into Excel

the code below uses the Workbook object's Load method toload CSV files into Excel. This file is then parsed. finally, it uses the SaveAs technique to store the report in CSV format.



private void button4_Click(object sender, EventArgs e)
{
    WorkBook wb = WorkBook.Load("Normal_Excel_File.xlsx"); //Import .xls, .csv, or .tsv file
    wb.SaveAs("Parsed_CSV.csv"); //Exported as : Parsed_CSV.Sheet1.csv
}



Enter fullscreen mode Exit fullscreen mode

remember to create an Excel Workbook named Normal_Excel_File.xlsx containing the subsequent records
Image description

Export the Parsed CSV file

the exported CSV document may be saved as Parsed_CSV.Sheet1.csv because the data is on Sheet1 interior of the Excel Workbook. Beneath is what the report would look like in report Explorer.
Image description

IronXL license keys permit you to deploy your mission stay with no watermark.

Licenses start at just $499 and encompass one free 12 months of help and updates.

With a trial license key, you may also strive for IronXL loss for 30 days.

Iron software offers you the possibility to seize their complete package at a lower fee.

The iron software suite comprises five additives IRONPDF, IRON XL, IRONOCR, IRONBARCODE, and the IRONWEBSCRAPER.

At the most exact percent price, you may get the whole bundle by paying in one installment.

It's undoubtedly a possibility worth going for.

You can download the software product from this link.

Top comments (0)