Jump to content

Rust for the Novice Programmer/CSV Program/CSV File and Parsing

From Wikibooks, open books for an open world

Adding CSV file and Parsing

[edit | edit source]

Ok, let's again create a new cargo program with 'cargo new csv-program'.

For our testing csv file, we will use this simple letter frequency csv file:

"Letter","Frequency","Percentage"
"A",24373121,8.1
"B",4762938,1.6
"C",8982417,3.0
"D",10805580,3.6
"E",37907119,12.6
"F",7486889,2.5
"G",5143059,1.7
"H",18058207,6.0
"I",21820970,7.3
"J",474021,0.2
"K",1720909,0.6
"L",11730498,3.9
"M",7391366,2.5
"N",21402466,7.1
"O",23215532,7.7
"P",5719422,1.9
"Q",297237,0.1
"R",17897352,5.9
"S",19059775,6.3
"T",28691274,9.5
"U",8022379,2.7
"V",2835696,0.9
"W",6505294,2.2
"X",562732,0.2
"Y",5910495,2.0
"Z",93172,0.0

Simply copy and paste this into a new file under the project root, so directly under csv-program, called 'letter_frequency.csv' Then we can start coding.

First, we will read the file into a string in Rust. For this, we can use the very convenient function std::fs::read_to_string. Again, we will 'use std::fs;' to make it more convenient when calling the function.

use std::fs;
fn main() {
    let csv_file = fs::read_to_string("letter_frequency.csv").unwrap();
   
    println!("{}", csv_file);
}

If we run this, we should see the contents of the file printed to the terminal!

Now we want to parse the file line by line. The first line should be handled differently as it should be the headers of the file. To do this, we can use the lines() function on an str. This gives us an iterator but we will need to learn the basics of iterators to use it for what we want.

How iterators work
[edit | edit source]

An iterator is a struct that implements the trait 'Iter'. This trait simply requires that the struct has the function:

fn next(&mut self) -> Option<Self::Item>;

All other behaviour can be derived from this function. Notable about this function is that it requires a mutable reference to the struct, meaning that once the iterator has been used once, it is used up and can't be used again. Another notable thing is that it returns an Option of the type of the iterator which is how we know whether it is finished; it returns a Some() of the item if the iterator has items left and None if the iterator has finished.

Therefore we can do this to get the first line:

fn main() {
     let csv_file = fs::read_to_string("letter_frequency.csv").unwrap();
     
     let mut lines_iter = csv_file.lines();
     let first_line = lines_iter.next().unwrap();
     for line in lines_iter {
         //parse individual line
     }
}

Note we can still use the iterator after removing the first line in a normal for loop and it will iterate through all the lines except the first. Also we simply unwrap the first line since we are supplying the file we can assume it will have a first line but this may not be true for all files.

To split each line up based on commas, we can use the split function which can take in a string or char and outputs an iterator over the different parts of the string. But we want to store these parts in a vector which can be done using the collect() function so let's wrap this all into its own function:

fn split_string(input: &str) -> Vec<&str> {
    input.split(',').collect()
}

And then we change our main function like follows:

fn main() {
    let csv_file = fs::read_to_string("letter_frequency.csv").unwrap();
    let mut lines_iter = csv_file.lines();
    let first_line = lines_iter.next().unwrap();
    let headers = split_string(first_line);
    // do something with headers??
    for line in lines_iter {
        //parse individual line
        let values = split_string(line);
       
        // do something with values??
    }
}

Next: parsing numbers from our csv file