How to deal with large datasets and NetSuite CSV Import

Large-ish Data Sets

The other day I was asked to help with a data upload. It was a pretty simple upload into a custom record, and I didn’t have to do a ton of data manipulation. The problem with the data set is that NetSuite has a limit of 25,000 rows (not records… The header counts) when doing an upload with the CSV tool. This particular data set had nearly 320,000 rows.

Obviously, you could throw this all into Excel, and cut and paste 24,999 rows at a time into individual CSV files, but where’s the fun in that? “If your only tool is a hammer, then everything looks like a nail.” I like to use the right tools for the job.

TL;DR: I pulled out one of my favorite tools for data manipulation and in about 10 seconds I had the file split into chunks with 24,999 data rows each. What is this amazing tool, you ask? It’s called Miller.

Miller is a command-line tool that is awesome for dealing with CSV and TSV files. It runs on Linux, Mac, and Windows. It’s written in Go, and because it uses streaming, it can actually handle data sets that are bigger than the available memory on your computer. (Not that we run into that very often, but it’s nice to know that we could if we needed.)

I actually have a bash script that I keep in my toolbox for just this occasion that looks like this:

#!/usr/bin/env bash
set -eu

# Configuration
INPUT_FILE=$1
CHUNK_SIZE=24999
OUTPUT_PREFIX="chunk_"

# Miller Command
mlr --csv put -q '
  begin { @count = 0; @chunk = 1 }
  @count += 1;
  if (@count > '"$CHUNK_SIZE"' ) {
    @count = 1;
    @chunk += 1;
  }
  tee > "'"$OUTPUT_PREFIX"'" . string(@chunk) . ".csv", $*
' "$INPUT_FILE"

I put that in ~/.local/bin as split-csv-for-netsuite (remember to make it executable with chmod +x ~/.local/bin/split-csv-for-netsuite) so that it is available whenever I need it. It takes one argument, which is the file that I want to split, and spits out files with 24,999 rows of data ready for NetSuite.

So, if I had a file ./lots-of-data.csv with 30,000 rows, I’d type this:


split-csv-for-netsuite ./lots-of-data.csv

The tool splits that into two files called ./chunk_1.csv and ./chunk_2.csv, and those files are ready to get uploaded with the CSV uploader. You can actually queue up multiple uploads in NetSuite so you can set it and forget it.

There are other ways to handle large datasets, like Map/Reduce scripts or Scheduled Scripts, but for quick and dirty one-time uploads, I’ll stick with CSV importer and my splitter tool.