The CSIRO netCDF/OPeNDAP interface to matlab

Version 3.33

12 January 2007

Introduction

Summary of functions

Installation

Portability and known problems

Revision history

Alternative ways of accessing netCDF and OPeNDAP data

Disclaimer

People involved in the development of the interface

Contact details

Matlab links


Introduction

The CSIRO interface is used in a matlab session to retrieve data from either a local netCDF file or via an OPeNDAP/DODS server. The same commands are used for either type of access in almost every case (some small differences are discussed here).

The interface has options for automatically handling missing values, scalefactors, and the permutation of hyperslabs. It also has a simple syntax.

The method of installing the software is described here.

There are other ways of accessing netCDF files and OPeNDAP/DODS data and some of them are discussed briefly here.


Summary of functions

Basic functions

There are six basic functions which are used to access datasets either from locally held netCDF files or via an OPeNDAP/DODS server.

If dealing with a netCDF file then the first argument to each function will be a file name. For example, '/home/netcdf-data/sst_cac_recon_ltm.nc' is a full file name (including a path) to a certain netCDF file at the CSIRO Marine Labs. The same file is available via an OPeNDAP/DODS server with the url 'http://www.marine.csiro.au/dods/nph-dods/dods-data/climatology-netcdf/sst_cac_recon_ltm.nc'. In the examples that follow we will use this file.

The six basic functions are:

For a more detailed description of the functions and some examples just follow the links. For an introduction it is suggested that you look at the functions in the order that they are listed above. Of course documentation is also available using the matlab help facility.

Auxiliary functions

There are also some auxiliary functions which are listed below. Use the matlab help facility for a more detailed description of them. You will not usually want to call these directly although some of the time related functions may be useful on occasion.


inqnc

inqnc is an interactive function that is used to find out about the structure of a netCDF file or OPeNDAP dataset. In the latter case you could use a web browser for the same purpose. Try clicking on the url 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc'  to see this structure. The same information is found in the matlab example below. Of course, the output from inqnc will be almost identical if we look at the netCDF file on a local disk.

Example

>> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> inqnc(file)
--- Global attributes ---
source: Test program

The 5 dimensions are 1) dim_unlmited = 3 2) depth1 = 12 3) depth2 = 11 4) dim3 = 3 5) dim4 = 4.
dim_unlmited is unlimited in length

----- Get further information about the following variables -----

-1) None of them (no further information)
0) All of the variables
1) time 2) u 3) ureverse
4) uchar1 5) uchar2 6) uchar3
7) ushort 8) ulong 9) udouble
10) no_atts 11) big_var 12) depth1
13) depth2 14) dim3

Select a menu number: 1

--- Information about time(dim_unlmited ) ---

*units: days since 1990-1-1 00:00:0.0 *long_name: Time

----- Get further information about the following variables -----

-1) None of them (no further information)
0) All of the variables
1) time 2) u 3) ureverse
4) uchar1 5) uchar2 6) uchar3
7) ushort 8) ulong 9) udouble
10) no_atts 11) big_var 12) depth1
13) depth2 14) dim3

Select a menu number: 2
--- Information about u(depth1 depth2 ) ---

*long_name: u,5_januarys *units: cm/sec
*ml__FillValue: 10000000270000000 *missing_value: 10000000270000000
*valid_range: -10000000270000000 10000000270000000
*test_double: 100 2000 *test_short: 25 -3 19
*test_long: -4 333 -17 *scale_factor: 3
*add_offset: 0.5

----- Get further information about the following variables -----

-1) None of them (no further information)
0) All of the variables
1) time 2) u 3) ureverse
4) uchar1 5) uchar2 6) uchar3
7) ushort 8) ulong 9) udouble
10) no_atts 11) big_var 12) depth1
13) depth2 14) dim3

Select a menu number: -1

getnc

Introduction

getnc retrieves data in two ways. It can be used used interactively to retrieve data from a netCDF file.

getnc is more commonly used as a function call - it can then retrieve data from both netCDF and OPeNDAP files. Because many options are available getnc can take up to 11 input arguments (although most have default values). To make things easier for the user there are various ways of specifying these arguments. Finally, a number of examples are given.

Interactive use

To retrieve data interactively the user simply types in 

>> val = getnc(file);

where file is a string containing the name of the netCDF file. From there the user is prompted for more information.

Arguments - meanings and defaults

There are 11 variables that getnc must know. Don't be frightened however as there are some easy ways to specify them and all but two have defaults. The variables are:

  1. file: This is a string containing the name of the netCDF file or the URL to the OPeNDAP dataset. It does not have a default. If describing a netCDF file it is permissible to drop the ".nc" prefix but this is not recommended.

  2. varid:  This may be a string or an integer. If it is a string then it should be the name of the variable in the netCDF file or OPeNDAP dataset. The use of an integer is a deprecated way of accessing netCDF file data; if used the integer then must be the menu number of the n dimensional variable as shown by a call to inqnc.

  3. bl_corner: This is a vector of length n specifying the hyperslab corner with the lowest index values (the bottom left-hand corner in a 2-space).  The corners refer to the dimensions in the same order that these dimensions are listed in the inqnc description of the variable. For a netCDF file this is the same order that they are returned in a call to "ncdump". With an OPeNDAP dataset it is the same order as in the DDS. Note also that the indexing starts with 1 - as in matlab and fortran, NOT 0 as in C. A negative element means that all values in that direction will be returned.  If a negative scalar (or an empty array) is used this means that all of the elements in the array will be returned. This is the default, i.e., all of the elements of varid will be returned.

  4. tr_corner: This is a vector of length n specifying the hyperslab corner with the highest index values (the top right-hand corner in a 2-space). A negative element means that the returned hyperslab should run to the highest possible index (this is the default). Note, however, that the value of an element in the end_point vector will be ignored if the corresponding element in the corner vector is negative.

  5. stride: This is a vector of length n specifying the interval between accessed values of the hyperslab (sub-sampling) in each of the n dimensions.  A value of 1 accesses adjacent values in the given dimension; a value of 2 accesses every other value; and so on. If no sub-sampling is required in any direction then it is allowable to just pass the scalar 1 (or -1 to be consistent with the corner and end_point notation). Note, however, that the value of an element in the stride vector will be ignored if the corresponding element in the corner vector is negative.

  6. order: 

  7. change_miss: Missing data are indicated by the attributes _FillValue, missing_value, valid_range, valid_min and valid_max. The action to be taken with these data are determined by change_miss.

  8. new_miss: This is the value given to missing data if change_miss == 3.

  9. squeeze_it: This specifies whether the matlab function "squeeze" should be applied to the returned array. This will eliminate any singleton array dimensions and possibly cause the returned array to have less dimensions than the full array.

  10. rescale_opts: This is a 2 element vector specifying whether or not rescaling is carried out on retrieved variables and certain attributes. The relevant attributes are _FillValue', 'missing_value', 'valid_range', 'valid_min' and 'valid_max'; they are used to find missing values of the relevant variable. The option was put in to deal with files that do not follow the netCDF conventions (usually because the creator of the file has misunderstood the convention). For further discussion of the problem see here. Only use this option if you are sure that you know what you are doing.

  11. err_opt: This is an integer that controls the error handling in a call to getnc.

Arguments - ways of specifying


Specifying up to 11 arguments to getnc can be complicated and confusing. To make the process easier getnc will accept a variety of types of input. These are given as follows:

>> values = getnc(file, varid, bl_corner, tr_corner, stride, order, change_miss, new_miss, squeeze_it, rescale_opts, err_opt);

>> values = getnc(file, varid);

If you want non-default behaviour for one or more of the later arguments then you can do something like:

>> values = getnc(file, varid, -1, -1, -1, -1, change_miss, new_miss);

In this case there are 4 arguments specified and 7 with default values used.

>> x.file = 'fred.nc';
>> x.varid = 'foo';
>> x.change_miss = 1;
>> values = getnc(x);

This specifies 3 arguments and causes defaults to be used for the other 8.
Note that it is possible to mix the usual arguments with the passing of a structure - it is only necessary that the structure be the last argument passed. We could achieve the same effect as above by doing:

>> x.change_miss = 1;
>> values = getnc('fred.nc', 'foo', x);

Examples

In the following examples we use our standard OPeNDAP file "http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc" to illustrate the usage of getnc

The simplest command line call to make is the following:

>> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> u = getnc(file, 'u');

The first argument specified is the file name or url. The second argument is the name of the variable - we could have found this by using inqnc. The result is that the entire contents of the u variable will be returned to the matlab session.

Alternatively we could have passed a structure to getnc to get the same answer.

>> x.file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> x.varid = 'u';
>> u = getnc(x);

We may only want a part of the variable and that is what the 3 arguments (bl_corner, tr_corner, stride) are about. If we use inqnc to consider the u variable described in our example file we see that it has two dimensions ((depth1 depth2) in that order. We say that the variable is in a 2-dimensional rectangle. We also saw that there are 12 and 11 points in each of the directions. Thus we can imagine extracting a subset of the data known as a hyperslab. The argument bl_corner specifies the bottom left hand corner of the hyperslab, tr_corner specifies the top right-hand corner and stride specifies the sampling done. An example to illustrate this is shown below.

>> u = getnc(file, 'u', [-1 3], [-1 9], [-1 2]);
>> size(u)
ans =
12 4

The 1st element in each of these arguments is -1 to indicate that we want to retrieve every point in that direction. Hence the 1st dimension of u is of length 12 – the full number of elements in the depth1 dimension. Now bl_corner(2) = 3, tr_corner(2) = 9 and stride(2) = 2. This means that in the depth2 direction we want every secondpoint from the 3rd to the 9th, i.e., points 3, 5, 7 and 9. Hence the 2nd dimension of u is of length 4.

The next argument to discuss is order. In general it is best not use this option and just use the default (-1). The option allows you to reverse the dimensions in the returned value. Since netCDF files store data in row-major order but matlab does the opposite, it is possible, in principle, to make some efficiencies when retrieving data from a local netCDF file. However this is rarely significant and the option is only retained for backwards compatibility with older versions of getnc. (For OPeNDAP files setting order = -2 is always less efficient than -1.)

The following example illustrates this.

>> u = getnc(file, 'u'); 
>> size(u)
ans =
12 11
>> ut = getnc(file, 'u', -1, -1, -1, -2);
>> size(ut)
ans =
11 12

Note that in the 2nd case we have used -1, -1, -1 for the corner, end_point, stride arguments to indicate that we want the default case of getting all possible values. We could have passed a structure to get the same result as below:

>> x.file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> x.varid = 'u';
>> x.order = -2;
>> ut = getnc(x);
>> size(ut)
ans =
11 12

The default behaviour of getnc is to replace missing values in the data with NaNs. (By missing values we mean those values equal to the _FillValue or missing_value attribute or outside the range determined by the valid_min, valid_max or valid_range attribute. This is discussed in the netCDF user's guide at http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attribute-Conventions.html#Attribute-Conventions.) The pair of arguments change_miss and new_miss can change this. If change_miss = 1 then any missing values are returned unchanged. If change_miss = 2 then they are changed to a NaN (the default, also available as change_miss = -1). If change_miss = 3 then any missing values are replaced by new_miss.

This is illustrated in the following example – note that we pass a structure, x, here and have made sure that x is empty at the start.

>> x = [];
>> x.file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> x.varid = 'u';
>> x.bl_corner = [12 11];
>> x.tr_corner = [12 11];
>> u = getnc(x)
u =
NaN

We use the simplest version of getnc to retrieve the last value of the array – we get a NaN because the value actually stored in the dataset is marked as a missing value. Next we try change_miss = 1,

>> x.change_miss = 1;
>> u = getnc(x)
u =
3.0000e+16

Now, 3.0000e+16, the value actually stored in the file, is returned. Finally, we use change_miss = 3 to cause the missing value to be replaced by 1.5 in our matlab array.

>> x.change_miss = 3;
>> x.new_miss = 1.5;
>> u = getnc(x)
u =
1.5000

The next argument, squeeze_it, deals with singleton dimensions (i.e., those of length 1). If squeeze_it = 1 (the default behaviour) then any singleton dimension will be eliminated as if the matlab function squeeze had been applied. If squeeze_it = 0 then the singleton dimensions will remain. This is illustrated in the following examples.

>> big_var = getnc(file, 'big_var', [-1 2 2 5 -1], [-1 2 2 5 -1]);
>> size(big_var)
ans =
12 3
>> big_var = getnc(file, 'big_var', [-1 2 2 5 -1], [-1 2 2 5 -1], -1, -1, -1, -1, 0);
>> size(big_var)
ans =
3 1 1 1 12

This option is not really necessary any more because matlab has the squeeze function. It was originally put in to enable backwards compatibility with earlier versions of getnc written before matlab dealt with multi-dimensional arrays and so we are stuck with it.

From version 3.3 onwards getnc has given the user some control over error handling. In the examples below we ask for a non-existent variable. The default behaviour (err_opt == 2) returns an empty array and prints a warning message as below.

>> junk = getnc(file, 'junk')
WARNING: junk is not a variable in http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc
junk =
[]

Setting err_opt == 1 causes getnc to be aborted due to the non-existent variable as seen below.

>> x = []; 
>> x.err_opt = 1;
>> junk = getnc(file, 'junk', x)
??? Error using ==> getnc_s>error_handle
ERROR: junk is not a variable in http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc
Error in ==> getnc_s at 872
values = error_handle([], mess_str, [], err_opt);
Error in ==> getnc at 211
values = getnc_s(varargin);

Finally, can see the dangerous option err_opt == 3 which causes an empty array to be returned and no error message.

>> x.err_opt = 3; 
>> junk = getnc(file, 'junk', x)
junk =
[]

This might be used when getnc is called in a loop and you don't want to get a large number of error messages. Of course you should be careful to handle the returned values properly.


timenc

timenc finds the time vector and the corresponding base date for a netCDF file or DODS/OPeNDAP dataset that follows the COARDS conventions. In practice this means that time-like variable should have a units attribute of a certain form. An example is:

'seconds since 1992-10-8 15:15:42.5 -6:00'.

This indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon in the time zone which is six hours to the west of Coordinated Universal Time (i.e. Mountain Daylight Time). The time zone specification can also be written without a colon using one or two-digits (indicating hours) or three or four digits (indicating hours and minutes). Instead of 'seconds' the string may contain 'minutes', 'hours', 'days' and 'weeks' and all of these may be singular or plural and have capital first letters. The letters 'UTC' or 'UT' are allowed at the end of the string, but these are ignored.

It is possible to have many different types of calendars but timenc only implements five at present.

These are necessary because there is some confusion with dates before October 15 1582 when the Gregorian calendar was introduced. A problem also arises when the reference date in the units attribute is before this. timenc deals with this by recognising some of the CF conventions and returns different answers depending on the value of the calendar attribute of the time-like variable. Also, some numerical models like to pretend that every year has the same number of days - 365, 366 and 360 are all used.

  1. calendar = 'standard', 'gregorian' or is not specified. In this case the relevant calculations are done for the true Gregorian calendar as decreed by Pope Gregory XIII. This has a discontinuity so that the day after 4 October 1582 is 15 October 1582. This is the calendar almost universally used today and what udunits works with today. timenc has worked this way since revision 1.10 in 2000.

  2. calendar = 'proleptic_gregorian'. In this case the relevant calculations are done using the matlab functions datenum and datevec which simply extend the way our present calendar works backwards into the past. This is called the proleptic Gregorian calendar. Accordingly, dates are continuous, i.e., the day after 4 October 1582 is 5 October 1582, but does NOT correspond to historical time anywhere. As well there is a year zero. timenc used to work this way before revision 1.10 in the year 2000 and I believe that udunits also did at some stage in the past.

  3. calendar == 'noleap', '365_day'. Here it is assumed that every year has 365 days.

  4. calendar == 'all_leap', '366_day'. Here it is assumed that every year has 366 days.

  5. calendar == '360_day'. Here it is assumed that every year has 360 days and every month has 30 days.

Note that other values of the calendar attribute produce an error message. This can usually be avoided by the user specifying the calendar explicitly in the call to timenc.

The general form of a timenc call is:

[gregorian_time, serial_time, gregorian_base, serial_base, sizem, serial_time_jd, serial_base_jd] = timenc(file, time_var, corner, end_point);

A full description of the options can be found by typing help timenc in matlab.

Examples

In the following examples we use our standard OPeNDAP file sst_cac_recon_ltm.nc.

The simplest command line call to make is the following:

>> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> [gregorian_time, serial_time] = timenc(file);

Note that since the time-like variable is named 'time' we did not even have to put in its name. We now look at the matrix that contains the gregorian time.

>> gregorian_time(1, :)
ans =
1.0e+03 *
1.9900 0.0010 0.0010 0 0 0.0000
>> gregorian_time
ans =
1990 1 1 0 0 0
1990 1 2 0 0 0
1990 2 10 12 0 0

Each row of the the matrix gregorian_time contains a time in year, month, day, minute, hour, second format. Thus the last date is for noon, 10 February, 1990. We can see the same thing by looking at the vector serial_time.

>> size(serial_time)
ans =
3 1
>> datestr(serial_time)
ans =
01-Jan-1990 00:00:00
02-Jan-1990 00:00:00
10-Feb-1990 12:00:00

serial_time gives the time in the format used by matlab's functions datenum, datevec and datestr. Thus we can use datestr to print out the last date.

Here we get the 1st and 2nd dates.

>> [gregorian_time, serial_time] = timenc(file, 'time', 1, 2);
>> datestr(serial_time)
ans =
01-Jan-1990 00:00:00
02-Jan-1990 00:00:00

attnc

attnc returns selected attributes of a netcdf file or DODS/OPeNDAP dataset. The first form of an attnc call is:

att_val = attnc(file, var_name, att_name);

In this case it simply returns the value of a named attribute. You can use the var_name 'global' to retrieve a global attribute.

The second form of the call is:

[att_val, att_name_list] = attnc(file, var_name, att_name);

In this case all of the attributes and their names are returned in cells named att_val and att_name_list.

Examples

In the following examples we use our standard OPeNDAP file test_1.nc.

>> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> [att_val, att_name_list] = attnc(file, 'u');
>> length(att_val)
ans =
10
>> att_val{1}
ans =
u,5_januarys
>> att_name_list{1}
ans =
long_name

Here we retrieve all of the attributes for the variable u. We see that there are 10 elements in each cell and that the first attribute has name long_name and is a string containing u,5_januarys.

By not giving a variable or attribute name we get information about all of the global attributes.

>> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> att_name_list
att_name_list =
'source'
>> att_val
att_val =
'Test program'

In this case there is only one global attribute named source and it is a string containing Test program.

By giving the variable and attribute names we can get simply the value of the attribute.

>> [att_val, att_name_list] = attnc(file, 'u', '_FillValue');
>> att_val
att_val =
1.0000e+16

A single global attribute can be retrieved by using the name 'global' in the call to attnc as below.

>> [att_val, att_name_list] = attnc(file, 'global', 'source'); 
>> att_val
att_val =
Test program

ddsnc

ddsnc returns information about a netcdf file or DODS/OPEnDAP dataset. The general form of a ddsnc call is:

desc = ddsnc(file)

desc is a matlab structure. For an OPEnDAP data set desc will contain all of the information in the DDS (Dataset Description Structure). For a netCDF file desc will be almost identical. (It cannot be exactly the same since netCDF files are not identical to OPeNDAP data sets.)

Examples

Information about the Reynolds data set can be found as follows:

>> file = 'http://www.marine.csiro.au/dods/nph-dods/dods-data/test_data/test_1.nc';
>> desc = ddsnc(file
desc =
variable: [1x14 struct]
dimension: [1x5 struct]

desc has 2 fields - variable and dimension. Looking at one element we see

>> desc.variable(2)
ans =
type: 'Float32'
name: 'u'
dim_statement: {'depth1 = 12' 'depth2 = 11'}
dim_idents: [2x1 double]

The first 2 fields tell us that the variable is named 'u' and is a 32 byte float (single precision real). The dim_statement field tells us that the u variable has 2 dimensions in the order given. For dim_idents we see

>> desc.variable(2).dim_idents
ans =
2
3

These integers refer to the dimensions of the u array. Looking at desc.dimension(2) and desc.dimension(3) we see

>> desc.dimension(2)
ans =
name: 'depth1'
length: 12
>> desc.dimension(3)
ans =
name: 'depth2'
length: 11

That is index 2 points us to the 2nd dimension, depth1 and it has length 12. (We saw the same information in the dim_statement field earlier.) A generic program could then retrieve the information by setting:

>> ii = desc.variable(2).dim_idents(1);

and then referring to desc.dimension(ii).


whatnc

whatnc lists all of the netCDF files (including compressed ones) in the current directory. It also lists all of the netCDF files in the common data set.

Example

Below is a possible listing returned by whatnc.

>> whatnc
----- current directory netCDF files -----
bar.cdf foo.cdf mycdf.cdf test_1.nc test_timenc.nc
----- current directory compressed netCDF files -----
EMPTY
----- common data set of netCDF files -----
bath_agso_2002.nc soc_climatology.nc
bath_agso_98.nc sst.mnmean.1981-present.nc

The list under the 1st heading shows all of the files in the current directory that seem to be netCDF files. This is based simply on whether they end in .cdf or .nc. Note that the .cdf suffix was used in the past to indicate a netCDF file but is no longet reccommended.

The list under the 2nd heading shows all of the files that end in nc.gz, nc.Z, cdf.gz or cdf.Z. These are presumed to be compressed netCDF files.

The 3rd list shows netCDF files in the area referred to as the common data directory. This directory will be searched by the inqnc, attnc and getnc commands and is set by the local system manager. This is done by simply editing the pos_cds.m file.


Installation

The interface has been installed on both unix and Windows pc systems. Installation is mostly a matter of copying the appropriate files to directories and then making them visible to matlab. Accordingly the experience should easily translate to other operating systems.

  1. Download either matlab_netCDF_OPeNDAP.tar.gz or matlab_netCDF_OPeNDAP.zip (the files in each are identical).

  2. Copy the file to where you keep your matlab source code and expand it using either gunzip and tar or unzip as appropriate.

  3. To be able to read netCDF files you need a version of mexnc that suits your operating system. A collection of suitable mex files can be found at http://mexcdf.sourceforge.net/downloads/. Note that to use the CSIRO interface it is only necessary to get the mex file and put it in your matlab path. The page describes other packages that are necessary to install the snctools and netCDF toolbox (which are well worth looking at in themselves - see Alternative ways of accessing netCDF and OPeNDAP data).

  4. To be able to access OPeNDAP files you will need suitable copies of the mex and binary files from the Matlab Command Line Tool package. There are two versions of this package and the interface will automatically detect whether you have installed the latest or earlier version. For the newest version the interface calls a mex file named loaddap which in turn calls a binary executable named writedap under linux or writedap.exe under Windows. In the older version these files are named loaddods and writeval but do essentially the same thing. There are other differences between the linux and Windows versions and these are described below.

Unix users

Simply put the appropriate loaddap mexfile (e.g., loaddap.mexglx) in a directory seen from your matlab path. The binary file writedap should also be put in the same directory. (The earlier versions are called loaddods.mex* and writeval.)

Windows users

For Windows things are more complicated. From Matlab Command Line Tool  you  will need both the Matlab Structs Tool package (with a name like ml-structs3.5.2-win32.zip) and the libdap library (with a name like libdap-prerequisites3.7.3.zip). To make things work we took loaddap.dll and writedap.exe from the first package and put them into a folder in the matlab search path. We then copied the dll files from the libdap prerequisite file to the same folder as the dll's. Earlier versions ofthe Matlab Structs Tool package had things arranged differently and you needed loaddods.dll and writeval.exe. Note that there may be a bug in the OPeNDAP interface.

  1. If you have a local data set of netCDF files that you want to be accessible to matlab without the user having to specify the path name then you can edit the file pos_cds.m. This matlab function will be used by getnc, attnc, timenc, inqnc and whatnc when it is trying to find a given netCDF file.

  2. When the directory structure was expanded in step 1 a subdirectory named test was created. To test the installation go to this subdirectory, start matlab and type test_all. This gives you options to test both the netCDF and the OPeNDAP installations. It does that by reading some data from a supplied netCDF file or from an OPeNDAP server. The data are compared to those in a supplied mat-file. The most common error is setting up the matlab paths incorrectly so that part of the interface is not visible to matlab. The testing of the OPeNDAP installation may fail sometimes because of dropouts in the internet somewhere.

    If you get to the end successfully then test_all will give you a timing message. It is interesting to see how much slower it can be to access the data remotely via the OPeNDAP interface.

Note that when testing the OPeNDAP installation on a Windows PC there may be a number of warning messages. These are because of the bug described here. If you get to the timing message then things have worked except for the known bug.


Portability and known problems

Portability

The software in this package is entirely made up of matlab script files and works for all versions of matlab later than and including matlab 6.1 (R12). Portability problems can arise with the binary files mexnc, loaddap and writedap which must be downloaded from elsewhere.

Bugs

There are no known bugs.

(Before version 3.22 there was be a problem on Windows PCs and some linux boxes when using OPeNDAP . The package could give the wrong result when reading variables which are character arrays. The problem is with the loaddap (loaddods) mex file and will presumably be fixed sometime in the future. At present there is a workaround in the timenc and getnc_s functions.)

Confusion with missing values

When reading some netCDF files getnc will return a missing value indicator (by default a NaN) in some places where there shouldn't be one. This is not due to a bug in getnc but occurs when the netCDF file is not following the attribute conventions (see http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html#Attribute-Conventions). Two relevant quotes from the documentation are:

The type of each valid_range, valid_min and valid_max attribute should match the type of its variable
(except that for byte data, these can be of a signed integral type to specify the intended range).

and

If _FillValue is defined then it should be scalar and of the same type as the variable.

To illustrate what this means and how a problem can occur consider the following extract from an example cdl file.

short airtemp(time, lat, lon) ;
airtemp:long_name = "Air temperature at surface" ;
airtemp:valid_range = -10000s, 10000s ;
airtemp:units = "degC" ;
airtemp:scale_factor = 0.01f ;
airtemp:_FillValue = 32766s ;

What has happened here is that the creator of the netCDF file has chosen to save space by storing the data as shorts (2 byte integers). The software reading the data will then multiply the add_offset of 0.01 by the integer values to produce the floating point value of the air temperature. Since the integers can take values between -32768 and 32767 then this can represent temperatures of between -327.68 and 327.67 degrees with a resolution of 0.01 degrees.

Note, however, that the valid_range goes from -10000 to 10000. Generic software interprets values outside of this range as faulty in some way and the default behaviour of getnc is to replace such values with a NaN. The creator of the file can use this to mark missing or contaminated data. Since the temperatures implied by these limits are -100 and 100 Celsius then the limits are “safe” since they represent physically unreasonable data.

This way of defining the valid_range is what is specified in the earlier quote.

A problem arises when the creator of the netCDF file misunderstands the attribute convention. They choose an “intuitive” definition of the attribute like:

airtemp:valid_range = -100.0f, 100.0f ;

Here they are thinking in terms of the true air temperature rather than the scaled version stored as integers. When getnc reads the valid_range attribute it then multiplies it by 0.01 and concludes that any temperatures outside the range of -1.0 to 1.0 are to be replaced by NaNs. Note that the same problem occurs when the file's creator makes the same error with other attributes – valid_min, valid_max, _FillValue and missing_value.

There are several workarounds for this problem. The simplest is to pass getnc the argument change_miss = 1. This will cause all values to be passed unchanged (apart from the rescaling implied by the scale_factor attribute). The disadvantage is that when very large values were used to indicate faulty data these will also be returned - in the example above you might end up with some temperatures greater than 100C.

The trickier, but more satisfactory option, it to use the rescale_opts option in getnc. It was designed to deal with errant netCDF files and is described here.


Revision history

The following is a partial history of revisions. I intend to keep it more up-to-date from version 3.0 onwards. In particular, bug fixes will be recorded.


Alternative ways of accessing netCDF and OPeNDAP data

There are a number of alternative ways of reading netCDF and OPeNDAP data into matlab. In most cases the time and computer resources taken to retrieve data will depend mostly on external factors such as internet bandwidth and disk access speed. Hence it would be surprising if one of these methods was significantly more efficient than any of the others.

netCDF

OPeNDAP


Disclaimer

This software is provided "as is" without warranty of any kind. It is covered by a general CSIRO Legal Notice and Disclaimer.


People involved in the development of the interface

The CSIRO matlab interface has been mostly written by Jim Mansbridge with some input from Peter McIntosh and Rose O'Connor (all of CSIRO).

Early versions of a mex file interface to netCDF (called xnetcdf) were worked on by Peter McIntosh and Jim Mansbridge. A series of much improved versions were created by Chuck Denham (USGS); these had names like mexcdf, mexcdf53 and ncmex. John Evans (Rutgers) has then developed the code further as mexnc and now maintains the project.

The Matlab Command Line Tool (from which loaddap and writedap are taken) was written by Glenn Flierl (MIT) and James Gallagher (URI).


Contact details

This web page is maintained by Jim Mansbridge, CSIRO Marine and Atmospheric Research.

Postal address: GPO Box 1538, Hobart, Tasmania 7001, Australia
Phone: +61-3-62 32 5416
Fax: +61-3-62 32 5123


This page is http://www.marine.csiro.au/sw/matlab-netcdf.html.


Visitors since 27 October 2005: 


Last revised by Jim Mansbridge on 12/01/07


Further details on the research of the CSIRO Marine and Atmospheric Research are available through the CMAR Home Page.

For more information contact reception@marine.csiro.au or telephone +61-3-62325222. Unless otherwise indicated all contents in these web documents are copyright © 1997 CSIRO.