NCTiles.jl Package Documentation

NCTiles.jl reads and writes NetCDF files that represent e.g. subdivisions of Earth's surface (tiles). Inter-operability with popular climate model grids and MeshArrays.jl and generation of CF-compliant files are key goals of this package.

Contents

NCTiles.jl derives from the earlier nctiles implementation in gcmfaces (Forget et al. 2015).

Main Features

NCTiles.jl first goes through lazy operations, on data structure, as it obtains information about variables etc. The second phase calls write function to instantiate and write files. Note: some of the included functions are interfaces to MITgcm output.

Data structures:

  • NCvar contains information needed to write a NetCDF file which can include a list of filenames (see BinData) if the data is not loaded into memory.
  • NCData contains a string (NetCDF file name) + metadata to read files.
  • BinData is a container for one field.
  • TileData contains a MeshArray or BinData struct in vals, information about the tile layout in tileinfo, and metadata needed to read/write tile data.

As an example:

struct TileData{T}
    vals::T
    tileinfo::Dict
    tilesize::Tuple
    precision::Type
    numtiles::Int
end

Use examples

DataStructures/06_nctiles.ipynb in this GlobalOceanNotebooks repo provides a series of examples that overlap somewhat with the ones found in examples/ex*.jl:

  • ex_1.jl reads a binary file containing one interpolated 2D field on a regular grid. It then writes that array to a NetCDF/NCTiles file.
  • ex_2.jl reads data from a NetCDF file containing one tile of model output. It then writes it to a new NetCDF/NCTiles file. This uses 3D data on a non-regular grid for one ocean subdivision (tile).
  • ex_3.jl is an example of interpolated model output processing in CBIOMES where several variables are included in the same NetCDF/NCTiles file.
  • ex_4.jl generates a tiled NetCDF output (i.e., a nctiles output) for a global 2D field on the non-regular LLC90 grid (see MeshArrays.jl). Since the tile width is set to 90, this creates 13 files.
  • ex_5.jl shows how to write a ClimGrid struct from the ClimateTools package to a NetCDF/NCTiles file using NCTiles.

Using NCTiles

The core functionality of NCTiles comes from a series of data structures that contain the information needed write to NetCDF files. This includes the information and methods needed to read from source files. The data structure used for writing a variable is NCvar, which includes that variable's data and metadata. The data itself can be in memory and included directly in the NCvar struct, or can be described in another class of data structures, with names ending in Data. These included BinData, for data in binary files, NCData, for data in NetCDF files, and TileData for data to be written out in chunks.

Basic Example

Here we show how to write a NetCDF file from a series of Binary data files.

Define Dimensions

The first step for creating a NetCDF file is to define your dimensions. Each dimension is specified by an NCvar. Dimensions should be in an Array in the order corresponding to your variable data (if your data dimensions are lon x lat x time, dimensions should be in that order as well). In this example we have a regular half-degree lat-lon grid with 10 time steps (as in ex_1.jl). This is how we define the dimensions:

lon = -179.75:0.5:179.75
lat = -89.75:0.5:89.75
time = 1:10

dims = [NCvar("lon","degrees_east",size(lon),lon,Dict("long_name" => "longitude"),NCDatasets),
        NCvar("lat","degrees_north",size(lat),lat,Dict("long_name" => "latitude"),NCDatasets),
        NCvar("time","days since 1992-01-01",Inf,time,Dict(("long_name" => "tim","standard_name" => "time")),NCDatasets)
        ]

Let's go through the NCvar constructor. Here is the struct definition for reference:

struct NCvar
    name::String
    units::String
    dims
    values
    atts::Union{Dict,Nothing}
    backend::Module
end

The first attribute, name, should be a String and is what you want to call the variable in the file. The second are the units, which should also be a String. We then specify the dimensions, dims. For Dimension variables dims should be of length 1 (calling size on your dimension values like above if sufficient). Next you specify the actual dimension values. For a Dimension variable, this must be a 1 dimensional array, like above. After the values you can specify any additional attributes that you want to add to the variable as a dictionary. The last attribute is the backend, which allows you to choose between NCDatasets.jl and NetCDF.jl. We have some support for NetCDF.jl and full support for NCDatasets.jl. Note that in creating these NCvar structs we do not do any CF Compliance checks, it is the user's responsibility to provide CF-compliant units.

Define the Data Source

Once you've created the dimensions for your NetCDF file you can create NCvar for your variable. Here we are going to create one pointing to data that is stored in multiple Binary files, one for each time step. The first step is to create this pointer to the data, which is the BinData struct. For example:

precision = Float32
datapath = "data/binfiles/"
fnames = joinpath.(Ref(datapath),
                    ["Chl050.001.data","Chl050.002.data","Chl050.003.data","Chl050.004.data","Chl050.005.data","Chl050.006.data","Chl050.007.data","Chl050.008.data","Chl050.009.data","Chl050.010.data"])
vardata = BinData(fnames,precision,(length(lon),length(lat)))
# or: vardata = BinData(fnames,precision,(length(lon),length(lat)),1)

And for reference, the struct definition for BinData:

struct BinData # Pointer to data stored in binary files- contains info needed to read in
    fnames::Union{Array{String},String}
    precision::Type
    iosize::Tuple
    fldidx::Int
end

In order to read data from a binary file, we need to know where the files are and their names, the precision that the data is written in, and the dimensions of the data. The first argument, fnames, should be a single file path String or an Array of file paths Strings. The second should be the precision that the data is written in the file, here our data is Float32. Precision should be a Type. Finally we need to know the size of the data that we are reading from the file, this should be specified as a Tuple. If we have multiple variables written in the same file, we can additionally specify the index of that variable, say if it's the 10th variable in the file. In this example there is only one variable in the file, so we can specify 1 or leave it out and it will be assumed to be 1.

Create the NCvar

Now we can create the NCvar for the variable we want to write to the file.

varname = "Chl050"
units = "mg Chl"
longname = "Average chlorophyll concentration (top 50m)"
myvar = NCvar(varname,units,dims,vardata,Dict("long_name" => longname),NCDatasets)

Creating the final NCvar for our variable is similar to creating a dimension NCvar. We specify the name we want to use in the file and the units. Here we use the dims array and the vardata struct we created above. We specify a long_name attribute, and finally indicate that we want to use NCDatasets in the backend.

Writing to the NetCDF File

Assuming you've created the above structs as expected, executing the write function is as simple as:

README = "A useful README that describes the data in the file."
attributes = Dict(["_FillValue"=>NaN, "missing_value" => NaN])
write(myvar,"data/mydata.nc",README=README,globalattribs=attributes)

The write function requires at a minimum an NCvar and the output file path. It writes the NCvar to the file with default global attributes. Additionally you can specify a README and global attributes, by passing a String or Array of Strings to the README keyword argument or by providing a Dict to the globalattribs keyword argument, as shown above.

If you would like to write multiple variables to the same file, you can pass a Dict{String,NCvar} into the write function:

myvars = Dict(["myvar1" => myvar1,
                "myvar2" => myvar2])
write(myvar,"data/mydata.nc")

Where the keys of the Dict should match the name attributes of the NCvar struct values.

Other Data Structures

In the example above we wrote a NetCDF file with data sourced from Binary Files, specified by the BinData struct. We have a few other structs for different kinds of data:

  • NCData: for data sourced from a NetCDF file
  • TileData: for data to be written into separate tile files

NCData

NCData structs contain the necessary information to read data from a NetCDF file.

struct NCData
    fname::AbstractString
    varname::AbstractString
    backend::Module
    precision::Type
end

For example, if you wanted to use the NetCDF file created before as a data source, you would use the NCData constructor:

myvardata = NCData("data/mydata.nc","Chl050",NCDatasets)

Where the arguments are: file path; variable name; backend.

Alternatively, we provide the function readncfile which creates NCvars containing the NCData structs for all the variables in the file:

ncvars,ncdims,fileatts = readncfile("data/mydata.nc")

Here, the ncvars dictionary contains NCvars of all the variables in the file. Each NCvar has NCData structs in the values attribute, which avoids reading in all the data from the file. In this case the NCData can be accessed as myvardata = ncvars["Chl050"].values.

To re-write this exact file run:

write(ncvars,joinpath("data/mydata2.nc"),globalattribs=fileatts)

You can see this process demonstrated in ex_2.jl.

TileData

The TileData struct is used to chunk up data and write to separate files. We do this using the MeshArrays package. This is demonstrated in more detail in ex_4.jl. First, specify your grid and read in the grid variables:

grid = GridSpec("LatLonCap","grids/GRID_LLC90/")
gridvars = GridLoad(grid)

Where GridSpec() and GridLoad() are from the MeshArrays package (you can refer to the MeshArrays documentation for more information about these functions and grids).

The next step is to specify the tile, or chunk, size as a tuple. The data is chunked in the horizontal dimension, so tile sizes should be two dimensional tuple. If the data is three dimensional, say its full dimension is NxMx10 and the tile size is nxm, the chunks will be nxmx10. Here we set the tile size to 90x90:

tilesize = (90,90)

When defining dimensions for TileData variables, the horizontal dimensions should be the size of the tiles, and their values integers 1:n or 1:m for an nxm tile:

time = 1:10
dep = gridvars["RC"]
dims = [
    NCvar("i_c","1",tilesize[1],1:tilesize[1],Dict("long_name" => "Cartesian coordinate 1"),NCDatasets),
    NCvar("j_c","1",tilesize[2],1:tilesize[2],Dict("long_name" => "Cartesian coordinate 2"),NCDatasets),
    NCvar("dep_c","m",size(dep),dep,Dict("long_name" => "depth","standard_name" => "depth","positive" => "down"),NCDatasets),
    NCvar("time","days since 1992-01-01",Inf,time,Dict(("long_name" => "tim","standard_name" => "time")),NCDatasets)
]

The latitude and longitude variables will be written to the file separately, their data specified by TileData structs:

tillat = TileData(gridvars["YC"],tilesize,grid)
varlat = NCvar("lat","degrees_north",dims[1:2],tillat,Dict("long_name" => "latitude"),NCDatasets)
tillon = TileData(gridvars["XC"],tilesize,grid)
varlon = NCvar("lon","degrees_east",dims[1:2],tillon,Dict("long_name" => "longitude"),NCDatasets)

Since the data for latitude and longitude are held in memory (in the gridvars dictionary), we can specify it directly. At construction, the TileData struct will create the mapping for which indices of the latitude and longitude data should be put in each tile. When a file is written, NCTiles will use this mapping to extract the chunk for that file. The dimensions for the corresponding NCvars should have the dimensions dims[1:2], corresponding to i_c and j_c.

The variable we want to write is in a binary data file, so we can use a BinData struct in the TileData for our variable:

vardata = TileData(BinData(fnames,prec,iosize),
                    tilesize,
                    grid)
myvar = NCvar(varname,"myunits",dims,vardata,Dict(),NCDatasets)

The final step is to create the NCvars and write them to the NetCDF files:

vars = Dict([varname => myvar,
                    "lon" => varlon,
                    "lat" => varlat
            ])

savenamebase = "data/mytiledata"
write(vars,savenamebase)

The write function will create one file for each tile, using savenamebase as a prefix for the file path. It will append a zero-padded number to the end of the filename, along with the extension .nc. For this example we would have the files data/mytiledata.0001.nc, data/mytiledata.0002.nc, ..., data/mytiledata.0013.nc.

Index

API / Functions

NCTiles.BinDataType
BinData

Data structure containing a string or an array of strings (NetCDF file names) as well as metadata needed to read a file.

source
NCTiles.BinDataMethod
BinData(fnames::Union{Array{String},String},precision::Type,iosize::Tuple)

Construct a BinData struct for files that contain one field.

source
NCTiles.NCDataType
NCData

Data structure containing a string or an array of strings (file names) of NetCDF files as well as information needed to read a file.

source
NCTiles.NCvarType
NCvar

Data structure containing information needed to write a NetCDF file. This includes a list of filenames (see Bindata) if the data is not loaded into memory.

source
NCTiles.TileDataType
TileData{T}

Data structure containing either a MeshArray struct or BinData struct (see vals), MeshArray structs describing the tile layout (tileinfo), and other information for reading/writing tile data.

source
NCTiles.TileDataMethod
TileData(vals,tilesize::Tuple)

Construct a TileData struct. First generate the tileinfo, precision, and numtiles attributes.

source
Base.writeMethod
write(myflds::Dict,savename::String;README="",globalattribs=Dict())

Creates NetCDF file and writes myflds and all their dimensions to the file.

source
Base.writeMethod
write(myfld::NCvar,savename::String;README="",globalattribs=Dict())

Creates NetCDF file and writes myfld and all its dimensions to the file.

source
NCTiles.addDataMethod
addData(v::Union{NCDatasets.CFVariable,NetCDF.NcVar},var::NCvar)

Fill variable with data in netcdf file using either NCDatasets.jl or NetCDF.jl

source
NCTiles.addDimMethod
addDim(ds::NCDatasets.Dataset,dimvar::NCvar) # NCDatasets

Add a dimension to a NCDatasets.Dataset

source
NCTiles.addDimMethod
addDim(dimvar::NCvar)

Add a dimension to a NetCDF file using NetCDF.jl

source
NCTiles.addDimDataMethod
addDimData(v::Union{NCDatasets.CFVariable,NetCDF.NcVar,Array},var::NCvar)

Add dimension data to predefined dimensions in a NetCDF file.

source
NCTiles.addVarMethod
addVar(ds::NCDatasets.Dataset,field::NCvar)

Add a variable to a NetCDF file using NCDatasets.jl

source
NCTiles.addVarMethod
addVar(field::NCvar,dimlist::Array{NetCDF.NcDim})

Add a variable with dimensions dimlist to a NetCDF file using NetCDF.jl

source
NCTiles.addVarMethod
addVar(field::NCvar})

Add a variable and its dimensions to a NetCDF file using NetCDF.jl

source
NCTiles.createfileFunction
createfile(filename, field::Union{NCvar,Dict{String,NCvar}}, README;
            fillval=NaN, missval=NaN, itile=1, ntile=1)

Create NetCDF file and add variable + dimension definitions using either NCDatasets.jl or NetCDF.jl

source
NCTiles.readbinFunction
readbin(fname::String,prec::Type,iosize::Tuple,fldidx=1)

Read in a binary file to an Array.

source
NCTiles.readbinFunction
readbin(flddata::BinData,tidx=1)

Read in a binary file as an array as specified by BinData

source
NCTiles.readncfileFunction
readncfile(fname,backend::Module=NCDatasets)

Read in a NetCDF file and return variables/dimensions as NCvar structs, and file attributes as Dict. Large variables/dimensions are not loaded into memory. This can use either NCDatasets.jl or NetCDF.jl

source