US Argo Data Assembly Center
Documentation
Error Handling

Problem Solving Guide
For Real-time Argo Operations


1) Minor problems - for floats using the ARGOS system

  • The collect software could not retrieve data from service ARGOS for a short period of time (less then the collect_limit).
Cause: Time out, no internet access, no name server, multiple logins, the computer at service ARGOS is down, or the IP address of the computer at service ARGOS changed.
Action: Wait, if the email reports that the process was qued for later execution (see also below).
  • Data are received from a float that has been dead for a long time.
Action: Check if a new float with the same transmission ID has been deployed (positions are a good hint).
  • Data are received for a float that is at depth.
Action: Look at hex file. If only one data line was added to the hex file, the transmission ID was corrupted during the data transmission (i.e. the data belong to another instrument). If many lines are received the float may have a severe problem, or another instrument with the same transmission ID exists.
  • If a profile has not enough data lines. The float may be on the shelf (if it is basically not moving), or it fails to go to it's target depth (if it is moving too fast).
  • HEX2CNT could not convert data.
The line in the email reads: "        0070_16956       0          1           ts6         yes"
The message in the hex2cnt*stderror is usually:
"Program Execution Terminated Abnormally while Processing hex file"
The message in the error file for the profile can be one (or more) of the following:
ERROR: COMPOSE_ARRAY too few scans to form complete message, n_scans = 3 Can't decode message without frame 1
ERROR: COMPOSE_ARRAY frame missing, scan 2
ERROR: COMPOSE_ARRAY frame missing, scan 1
Action: Wait. In most cases enough data will be available for the processing the next time.
  • The execution of qc programs fails if no geographic position is in data file (status 8 in QC_PROCESS email).
Action: Wait. Mostly a position is available the next time.

2) Problems that require action - for all floats

  • No meta file available.
Action: Find meta file ,and put into operational directory. Start manual process if necessary (see below).
  • A float does not have WMO number when a profile is processed.
Solution: Get a WMO number; enter the number in meta file. Start manual process if necessary (see below).
  • PHY2GTSFMT failed.
Cause: Most likely the file PHY2GTSFMT_TABLE is the problem. It has to have a carriage return at the end, and it may not have empty lines (this can occur after manual entries).
Action: Correct the file and execute 'cron/scripts/cron_argo.csh gts'.
  • A profile older than 30 days is in PHY2GTSFMT_TABLE (i.e. it has not been converted to GTS format yet).
Solution: Check if the date is correct (e.g. the year may be wrong in manual entries).

3) Problems that require action - for floats using the ARGOS system

a) simple:
  • A float does not have WMO number when a profile is processed.
Solution: Get a WMO number; enter the number in meta file. Run manual process to create cnt file, phy file and qc file. Add the profile to the PHY2GTSFMT_TABLE. Note: Profiles with no WMO number (i.e. WMO number is zero) are not added to PHY2GTSFMT_TABLE and they are not transmitted to GTS.
  • The collect software could not retrieve data from service ARGOS for a long period of time (close to, or longer than, the collect_period).
Cause: Time out, no internet access, no name server, multiple logins, the computer at service ARGOS is down, or the IP address of the computer at service ARGOS changed.
Action: Find out why. Increase the collect_period to ensure that all data are received. Note: No automatic restart will be scheduled if the collect_limit is exceeded.
b) complicated:
  • Two floats have the same transmission ID number, and data were processed.
Cause: Recycling of transmission ID numbers.
Example: Float 200 and float 46 (dead), both have transmission id number 00706. The new data from float 200 was added to hex file of float 46.
Solution: Edit hex file for float 46 (the old one) and delete the new data lines (those which belong to float 200). Move the old float from "floatno_argoid_list" to "dead_floatno_argoid_list" (to prevent this from happening: always check for duplicates when adding new floats to "floatno_argoid_list". Run PRV2HEX (in tests directory) to create hex file for float 200. Then run it through HEX2CNT, CNT2PHY and the RUN_QC (all in the tests directory). Finally update PACKQC2FTP_TABLE, PHY2GTSFMT_TABLE, and QC2MATLAB_TABLE.
  • A qc file has a bad profile position (failed Speed Check).
Solution: Edit phy and cnt files by changing the location class (column CLASS). The policy is to put a minus in front of the class (0 has to be replaced by -4). Then the QC has to be run again. Positions with negative location classes are ignored by the QC software.
  • A qc file has a bad position that is not the profile position (failed Speed Check).
Solution: Edit qc, phy and cnt files by changing the location class (column CLASS). The policy is to put a minus in front of the class (0 has to be replaced by -4). Change the speed check flag to 1 in the qc file.
  • ARGO process crashed, while trying to process an old prv file that was compressed (by an automatic process).
Solution: Delete the last line in the changed hex files This forces the PRV2HEX program to add data to the hex files. Then run ARGO_PROCESS again.
  • In the Web site at the bottom of Float Status Table you can find the following text:
Numbers of floats with profile number problem (total 20):
3 9 10 38 45 52 55 60 63 72 77 80 81 83 95 102 109 113 131 161
For these floats some profiles are missing. Either a (some) profile(s) have no position (i.e. no qc file can be generated), or no data were received for a (some) profile(s).
Solution: Randomly check what the reason is.

4) Exit status - for floats using the ORBCOMM system

  •   1    OK
  •   2    WMO ID in 'nc' file is a 'fill' value (0)
Action: Inform WHOI.
  •   3    WMO number not found in meta files
Action: Search in preliminary meta files. Inform WHOI.
  •   4    WMO ID in 'nc' filename differs from WMO ID inside the 'nc' file
Action: Inform WHOI. The ID in the filename may be a transmission ID
  •   5    WMO ID in 'nc' file can not be aphanumeric
Action: Inform WHOI. The ID may be a transmission ID
  •   6    WMO instrument type from meta file and 'nc' file do not match
Action: Inform WHOI. Which one is correct?
  •   7    WMO number from 'meta' file and 'nc' file do not match.
Action: Inform WHOI. The ID in the filename may be a transmission ID
  •   8    Number of columns is not 3 (p, t, c)
Action: Inform WHOI. Was a new parameter introduced?
  •   9    Profile direction (up/down) is inconsistant with the filename convention
Action: Inform WHOI. Which one is correct?
  • 10    Not a NetCDF file
Action: Inform WHOI.

5) conversion of eps files to gif files

  • Sometimes convert fails for individual files.
Symptoms: email with the subject "CP_CONVERT_EPS2GIF - Failure detected" is recieved (and, mostly, /var/tmp is filling up -> email stating / is full will be received)
a) check and clean up /var/tmp
b) look for file names in email
c) go to /partition/ARGO_OPERATIONS/web/images and ll *.eps
d) check if date of corresponding gif files are more recent or not. for the files for which the answer is no:
    convert -density 216 -geometry 38% file.eps file.gif