Oracle9i as associate tool?

The Astronomy.

From digital astronomical images sources are extracted by a special purpose program. Each source can have about a 100 parameters. The sources extracted from (part of) an astronomical image are stored in a so called source list. Within a source list all sources have the same parameters. For different source lists the parameter set might also be different. But each source list has a fixed set of parameters like the position and shape of the individual sources.
The position of a source is defined relative to an astronomical coordinate system. This coordinate system projects the position of the sources on a unit sphere (with radius R = 1, hour angle H.A. and declination D, with its centre located at the centre of the earth). On this unit sphere all positions are defined in terms of hour angle and declination (0o < H.A.< 360o, -90o< D <90o).
The geometrical shape of a source is usually an ellipse defined as a central position, a major and minor axis and a position angle of the major axis. The central position denotes the position or centre of the source, the surrounding ellipsoid the uncertainty in the position or a certain isophotal contour.
In order to monitor astronomical objects different source lists in a particular area of the sky have to be inspected. In principle we have to associate astronomical objects belonging to one source list with those belonging to a different source list. This is done, e.g., by finding sources which have overlapping parts with sources in other source lists in a common part of the sky.
When the so called associations are recognzed they should be linked, through pointers in the database. These associations are stored in a so called associate list where each entry points to the entries in de source lists were the sources are found which have overlapping areas.

The Implementation.

The oracle equivalent of a source list obviously would be a table. So the storing of the source parameters would cause no problem at all. There are even different ways of implementing this, but let us consider only the non OO implementation, the relational implementation, which (at the moment) seems to be the most simple and flexible way to store different source lists.
Oracle also offers a spatial coordinate system optimized for spatial queries, Oracle Spatial. Use of this package requires the definition of a geometrical coordinate system and a geometrical shape of the sources. Here the first problem arises since the standard spatial system of geodetic coordinates does not contain a non-earth bound coordinate system, in other words, all coordinates are expected to lie on a flattened sphere with finite radius. The latter causes not such a problem since we can always use a unit sphere, but the flattening problem is even worse since they only allow you to define the INVERSE flattening which is in our case 1/0!
Oracle's (and our temporary) remedy would be to "use a billion for the inverse flattening which would give us an accuracy of 1 part in a billion." This however might not be sufficient and one wonders how much cpu is lost in coordinate transformations which are much too complex for our purpose.
Another problem is that Oracle Spatial does not allow for an ellipsoidal geometrical shape. Only rectangles and polygons are allowed. For the time being we could do with a rectangular shape which closely surrounds the ellipsoidals. Another solution could be to use say a 16-point polygon to fit the ellipsoidal as closely as possible.
Before running the spatial queries one has to make an index on the geometry of the source (i.e. one has to create a special sorted list so that sources within a certain region can be found quickly). Oracle supports two types op spatial indexes, the Quad-tree and R-tree. The creation of an index for 2.5×105 sources takes about 2 minutes. Note that the creation of the indexes should only occur once, unless some sources are added or removed from the table.
Since we cannot use astronomical coordinates we did the testing on normal coordinates, i.e. in a flat universe where we choose for the shape of the sources rectangles with dimensions of 10"/cosine(declination)×10". Two such tables were created. The query used was a spatial join to find any interaction between the sources in both tables:


SELECT A.ID, B.ID FROM SOURCELIST1 A, SOURCELIST2 B
    WHERE SDO_RELATE(A.SHAPE,B.SHAPE,'MASK=ANYINTERACT QUERYTYPE=JOIN') = 'TRUE';

      
The following table shows the timing results:
Number of Sources R-Tree
(seconds)
Quad-Tree
(seconds)
Python Script
(seconds)
25 1.84 0.16 0.01
250 209.20 0.46 0.04
2500 68219.15 4.38 0.46
25000 --- 55.27 5.14
250000 --- 582.95 65.01
Using the R-tree index the results were rather disappointing. Running the query on the whole table did not produce a result within at least an hour. Running it on part of the table did produce a result but it took too long. Using the Quad-tree index did produce a good result, although compared to a simple python script which does the same query the amount of time needed is about a factor 10 too much.

The conclusions.

In order to use Oracle as the association tool the following points need improvement:

The above issues have been independently discovered by the AstroGrid project. Oracle have recognised these issues and are addressing them.