1.1 HOW-TO Ingest raw data into the database

Before any data can be processed by the Astro-WISE system, it must be ingested. Ingestion in this case means: splitting the images, storing these on the fileserver, and making an entry in the database. This chapter describes the necessary preparations for ingestion, and the ingestion process itself.

1.1.1 Preparations for the ingest

The images to ingest should first be sorted based on their intended use (i.e. their purpose, see http://www.astro-wise.org/portal/howtos/man_howto_schedule/man_howto_schedule.shtml#astrowise_compliant_processinghere). For locally stored, uncompressed copies of each image proceed as follows (if the images are compressed, decompress them first):

1)
Identify the files by collecting relevant header items. This can be done by using something like gethead < mathend000#header items> mathend000# *.fits. However, if there are too many files, the shell will refuse to expand the command. In this case, use foreach instead, and append the output to a file:
> foreach i ( *.fits )
foreach? gethead "HIERARCH ESO TPL ID" "HIERARCH ESO DPR TYPE" IMAGETYP 
OBJECT "HIERARCH ESO INS FILT1 ID" "HIERARCH ESO INS FILT ID" EXPTIME $i 
>> headers.txt 
foreach? end
Explanation: foreach is a c-shell looping construct. gethead is a wcstools program to get header items from FITS files. Hence for each FITS file a few relevant header items are read and appended to the file "headers.txt". Wcstools is most likely installed on your system. Website: http://tdc-www.harvard.edu/software/wcstools/http://tdc-www.harvard.edu/software/wcstools/.
2)
Group the images by the purpose for which these have been observed. This grouping is based on the header information retrieved in the previous step. For example:
> grep "BIAS, READNOISE" headers.txt > readnoise.txt
> grep "FLAT, DOME" headers.txt > domes.txt
> grep "FLAT, SKY" headers.txt > twilight.txt
> grep "STD, ZEROPOINT" headers.txt > photom.txt
> grep "NGC" headers.txt > science.txt
This will be easy if the guidelines for scheduling observations given http://www.astro-wise.org/portal/howtos/man_howto_schedule/man_howto_schedule.shtmlhere have been followed.
3)
Use, for example, the editor vim to remove anything but the file names from the text files produced in the previous step:
> vim bias.txt
Then type ":" to enter command mode (Esc cancels):
": %s/fits.*/fits"
This is a regular expression that will search and replace each occurrance of "fits< mathend000#something> mathend000#" with "fits". "%" means for all lines, "s" is for subsitute, and the "/"'s are to separate the search and replace expressions, ".*" matches one or more characters of any kind.
4)
You now have files containing a list of FITS filenames (one per line) named after the purpose for which the data was obtained. Now move the FITS files (or links) to subdirectories named after this purpose, for example:
> mkdir READNOISE
> foreach i (`cat readnoise.txt`)
foreach? mv $i READNOISE
foreach? end
That is it for the preparation. There are, of course, many ways to do this preparation, but this way is quite fast for any number of files.

1.1.1.1 Tips and possible complications

It may be helpful, especially when trying to ingest many files, to place links to the location of the raw MEF files in your current working directory:

> foreach i ( /ome03/users/data/WFI/2004/10/*.fits )
foreach? ln -s $i
foreach? end

In case the files are compressed with the common Unix compression programs gzip, zcat or bzip2 just make the links to the compressed files in the same way:

> foreach i ( /ome03/users/data/WFI/2004/10/*.fits.Z )
foreach? ln -s $i
foreach? end
Now we have links to all the files you want to ingest in your current working directory.

In case the images are compressed with common compression algorithms, you could work as follows:

> foreach i ( *.fits.bz2 )
foreach? dd bs=500k count=1 if=$i | bzip2 -qdc > hdr.txt
foreach? echo -n "$i " >> headers.txt
foreach? gethead "HIERARCH ESO TPL ID" "HIERARCH ESO DPR TYPE" IMAGETYP OBJECT
"HIERARCH ESO INS FILT1 ID" "HIERARCH ESO INS FILT ID" EXPTIME hdr.txt
>> headers.txt
foreach? end

(Explanation: dd (disk-dump(?)) reads one block of size 500k from the input file $i. The ouput is decompressed by bzip2 and redirected to an ascii file. You can use gethead on this file again to get the header items. Output is appended to the same file "headers.txt".)

Other commands that may be of use:

> fgrep [-v] -f <file1> <file2>  -- Print difference between files (diff works
much slower on large files).
> wc <file>                      -- Word count

1.1.2 Ingesting data

The actual ingestion of the data is handled by a Recipe called Ingest.py, which can be found in $AWEPIPE/astro/toolbox/ingest. If your username is AWJSMITH the Recipe is invoked from the Unix command line with the following command:

   env project='AWJSMITH' awe $AWEPIPE/astro/toolbox/ingest/Ingest.py -i <raw data> -t <type> [-commit]
where <raw data> is one or more file names (for example WFI*.fits), and <type> of the data to be ingested. Setting the environment variable project ensures that the data is ingested into your personal context. See the http://www.astro-wise.org/portal/howtos/man_howto_context/man_howto_context.shtmlContext HOW-TO for a description of the notion of context. To get a list of all possible values for the -t parameter, just type:
   awe $AWEPIPE/astro/toolbox/ingest/Ingest.py,
and an on-screen `manual' will show up.

Running the Ingest.py recipe, making good use of the preparations described in the previous section, is done thus (the read noise is taken as an example):

> cd READNOISE
> env project='AWJSMITH' awe $AWEPIPE/astro/toolbox/ingest/Ingest.py -i *.fits -t readnoise -commit
An alternative command, using science data as an example, is:
> cd SCIENCE
> foreach i (*.fits)
foreach? env project='AWJSMITH' awe $AWEPIPE/astro/toolbox/ingest/Ingest.py -i $i -t science -commit
foreach? end
Important note: due to the nature of the ingestion script, this last command can only be used for lists of individual science images.

The input data of the ingest script should be in the form of Multi-Extension FITS files (MEFs); most wide-field cameras write the data from their multi-CCD detector block in this form. The ingestion step splits an MEF file into its extensions, creates objects (a software construct) for each extension, stores each extension separately on a dataserver, and then commits the object, with relevant header items connected to it, to the database. Note that each extension is still saved locally, so make sure there is enough free space in the location you are running the ingest script. After ingesting, the local copies of the FITS files can be removed. The commit switch is necessary to actually store/commit data; if it is not specified, nothing is written to the dataserver or committed in the database. Note that a log is generated of the ingest process. The log file is called something like <datetime>.log.

Each file that is ingested needs to be named according to our filenaming convention. This means that the MEF file is named as follows:

    <instrument>.<date_obs>.fits
Example: WFI.2001-02-13T01:02:03.123.fits

If the file to be ingested is not named according to this convention, a symbolic link with the correct name is created, and the image is ingested with that filename. Hence the ingested image may not retain its filename.


page generated Tue Nov 21 10:29:30 CET 2017