Our goals were:
(a) reduce the size of the catalogue by a large factor (about 4) without loosing the integrity of the catalogue;
(b) that encoding/decoding procedures for a single GSC region be as fast as possible;
(c) that the process of loading the compressed version on disk and running application tools be machine independent (e.g. to be run both on DEC and SUN stations).
(d) build an efficient search tool for the compressed version of the GSC.
A GSC record, originally written on 45 bytes, is stored on disk using only 96 bits (12 bytes). Each field in the record is appropriately offset and scaled, then coded in a series of bits whose number depends on the dynamic range of the field.
Next table shows the number of bits used to encode a GSC record:
GSC field | bits | range | |
1 | GSC-ID | 14 | 16384 |
2 | RA | 22 | 4194304 |
3 | DEC | 19 | 524288 |
4 | pos-error | 9 | 512 |
5 | magnitude | 11 | 2048 |
6 | mag-error | 7 | 128 |
7 | mag-band (coded) | 4 | 16 |
8 | class. | 3 | 8 |
9 | plate-id (coded) | 4 | 16 |
10 | multiple (coded) | 1 | 2 |
Two more bits are used as spares.
At the beginning of each coded region a header contains information on the whole encoding process. Although we used the same encoding procedure for all regions, formally each region is independent and it is decoded according to the content of its header. The header is in ASCII, the content is the following:
Field in header | Settings for GSC 1.1 |
length of header | first 3 ch. of header |
version | 2 (for GSC 1.1) |
scaling factors: | |
RA | 100,000 |
DEC | 100,000 |
pos-error | 10 |
magnitude | 100 |
mag-error | 100 |
offsets: | |
RA | lower RA boundary |
DEC | lower DEC boundary |
mag | 0 |
number of plates | in the region |
plate list | plates used in the region |
Fields are separated by spaces. Additional spaces are added, if required, at the end of the header, to make its length a multiple of 4 bytes. Then the bit-encoded region follows, at an offset given by the first field in the header.