Thursday, November 7, 2013

Another primitive File I/O gotcha!

A part of the feature I'm writing is to load multiple regex from a file. The file is simple, each line represents a regular expression.

\b(cat|kitten)\b
\b(dog|curr)\b
\b(buffalo|tamaraw)\b

Now I needed to test my method that uses these regex:

public class Animals {
private List regexes;
public Animals(List regexes) {
this.regexes = regexes;
}
public String detectAnimal(String input) {
// for each regex, create a java.util.Pattern and at the first match, return Matcher.group()
}
}

The test data for animals is around 1 million that is stored in DB. For me to be able to test this is to use Groovy's DataSet to loop through the test data. For each iteration, run the detectAnimal method and save to DB. Easy enough.

Now here's the big Gotcha. Below is the Groovy code I used to build my List of regexes. Note that I used this code only for testing purposes.

File file = new File(Animals.class.getResource(Animals.DEFAULT_CONFIG_FILE).getFile()) Animals animal = new Animal(Arrays.asList(file.getText().split('\n')))

A character '\r' was not removed from the resulting list of regex which looks like this:
\b(cat|kitten)\b\r
\b(dog|curr)\b\r
\b(buffalo|tamaraw)\b
This essentially breaks all the regexes except the last one: \b(buffalo|tamaraw)\b Debugging this (I used Eclipse) was tricky since '\r' or carriage return is non-printable character.

To make this much more complicated, in the Java code, I have already written the code that loads the config file which uses Apache Commons IO FileUtils. This code was located in a Project wite Utils class which uses a java static block to load config files. Which, honestly, I don't like. Since using static blocks is a bad practice.

Honestly, this might have clouded my judgement of not using this Utils class. Also since Groovy is a shiny new language, so I was excited in using the new & improved File class. But apparently, Groovy's File class does not have readLines as opposed to FileUtils from Apache Commons.

TODO, make this post more readable

No comments: