User guide for the Gaussian Mixture classification code written in FORTRAN (gmclass.f).
----------------------------------------------------------------------------------------

Please send comments/questions to Jonas Debosscher: jonas@ster.kuleuven.be
Any feedback is appreciated!

Files included:
-----------------

'gmreadme.txt': this readme file.
'gmclass.f': the Gaussian Mixture classification code (FORTRAN), as described in Debosscher et al.,
 2007 (AA/2007/7638).
'defatts.dat': file with the training set (transformed light curve attributes, according to the transformation described in Debosscher et al., 2007). This is the input training file for the classification code.
'init.dat': initialization file for the 'gmclass.f' code. Here, the number and kind of variability classes and classification attributes, to be used during a classification run, are given.


Prerequisites:
---------------

The code has been successfully compiled and run under Suse Linux 10.1 using an Intel Fortran compiler (version 8.1 or higher).

Compiling and running the code (make sure you are in the code directory!):
--------------------------------------------------------------------------


To compile the code (command line):

>ifort gmclass.f -o executable

where 'executable' is the desired filename of the executable file.

To run the code (command line):

>./executable deffile datfile initfile outfile

where the meaning of the 4 command-line arguments is as follows:

'deffile': the file with the training set (to define the classes). Here, the file 'defatts.dat' has to be used.
'datfile': the file with the dataset to be classified, e.g. the file 'defatts.dat' can be used here as well.
'initfile': initialization file, e.g. the file 'init.dat'
'outfile': the desired output filename.

Note: make sure that all input files are in the same directory as the executable file. The output file will be created in this directory as well.


Description of the inputfiles:
-------------------------------


File with training set ('defatts.dat'):
--------------------------------------------------------

This file contains all the classification attributes (transformed light curve parameters), used to define the stellar variability classes.

FORTRAN line format of this file ('DEFATTFORMAT'): '(a30,1x,29(f13.8,1x),a6)'

Column description:

1)'name':Object identifier
2)'f1': main frequency present in the light curve (cycles/day)
3)'f2': second frequency (cycles/day)
4)'f3': third frequency (cycles/day)
5)'amp11': amplitude of the first harmonic of 'f1' (magnitude)
6)'amp12': amplitude of the second harmonic of 'f1' (mag)
7)'amp13': amplitude of the third harmonic of 'f1' (mag)
8)'amp14': amplitude of the fourth harmonic of 'f1' (mag)
9)'amp21': amplitude of the first harmonic of 'f2' (mag)
10)'amp22': amplitude of the second harmonic of 'f2' (mag)
11)'amp23': amplitude of the third harmonic of 'f2' (mag)
12)'amp24': amplitude of the fourth harmonic of 'f2' (mag)
13)'amp31': amplitude of the first harmonic of 'f3' (mag)
14)'amp32': amplitude of the second harmonic of 'f3' (mag)
15)'amp33': amplitude of the third harmonic of 'f3' (mag)
16)'amp34': amplitude of the fourth harmonic of 'f3' (mag)
17)'phdiff12': phase of 'amp12', if the phase of 'amp11'=0 (radians) 
18)'phdiff13': phase of 'amp13', if the phase of 'amp11'=0 (radians) 
19)'phdiff14': phase of 'amp14', if the phase of 'amp11'=0 (radians) 
20)'phdiff21': phase of 'amp21', if the phase of 'amp11'=0 (radians) 
21)'phdiff22': phase of 'amp22', if the phase of 'amp11'=0 (radians) 
22)'phdiff23': phase of 'amp23', if the phase of 'amp11'=0 (radians) 
23)'phdiff24': phase of 'amp24', if the phase of 'amp11'=0 (radians) 
24)'phdiff31': phase of 'amp31', if the phase of 'amp11'=0 (radians) 
25)'phdiff32': phase of 'amp32', if the phase of 'amp11'=0 (radians) 
26)'phdiff33': phase of 'amp33', if the phase of 'amp11'=0 (radians) 
27)'phdiff34': phase of 'amp34', if the phase of 'amp11'=0 (radians) 
28)'trend': slope of the linear trend (magnitude/day)
29)'varrat':  ratio of the variance after, to the variance before subtraction of least-squares fit with 'f1' and its 4 harmonics (values between 0 and 1)
30)'varred': final variance reduction due to subtraction of all the periodic signals (values close to 1 if the fit is good, close to 0 if the fit is poor)
31)'varcode': class code, if present (in the training set, this is always present).


File with dataset to be classified:
------------------------------------

This file contains the attributes of the objects to be classified (1 line per object). The format of the lines is specified by 'ATTFORMAT' (in the initialization file 'init.dat'). The order and kind of attributes (columns) has to be the same as for the file with the training set (apart from the lacking class label, which is unknown).

Initialization file ('init.dat'):
---------------------------------------

This file specifies the number/kind of classes and classification attributes to be used
by the 'gmclass.f' code. These can be changed by the user if desired.

Line description for this file (in this order and format!):

'NCLASS' (integer): Number of classes to use for the classification run (should be smaller than the total number of available training classes, see remark below).
'CLASSCODE(1:NCLASS)' (NCLASS times character*6): The 'NCLASS' codes of the classes to be used (one per line). These codes are the ones in the last column of the attribute lines for the objects in the training set. The order is not important (see below for a list of all the classes considered + their code).
'NATT'(integer): The total number of columns in the object attribute lines.
'ATTDS(1:NATT)' (NATT times character*10): Description of the columns of the object attribute lines. The order is important.
'NCLASSATT'(integer): The number of attributes that will be used for classifying.
'CLASSATTDS(1:NCLASSATT)' (NCLASSATT times character*10): Description of the attributes to be used for classifying. Order is not important.
'DEFATTFORMAT' (character*100): Fortran format of the object lines in the training set.
'ATTFORMAT' (character*100): Fortran format of the object lines in the database to be classified.
'OUTFORMAT' (character*100): Fortran format of the output lines.

Normally, 'DEFATTFORMAT' and 'ATTFORMAT' are identical, apart from the fact that the object lines in the training set contain an extra column with the known class code of the object. The file with
training objects (e.g. 'deffile') can thus also be used as a dataset to be classified ('datfile'). The known class labels in 'datfile' will not be read by the code in this case. This allows the user to perform 'resampling' experiments, as described in Debosscher et al., 2007.

Important remark:
------------------

Make sure that the number of attributes used in a classification run is always strictly smaller than the minimum number of light curves used to define any class included in the classification run (see also Debosscher et al., 2007). If not, the code will stop and print an error message: 'CLASSNDEFOBJ <= NCLASSATT'. This makes it impossible for the moment to include all 35 classes in a classification run, since for some of them, only one or two training light curves are available yet (see list below).


Description of the outputfile:
-------------------------------

After the classification run, the output file will contain one line per classified object.
This line contains the following information (in this order):

'Object identifier'
'Code of the most probable class'
'Code of the second most probable class'
'Code of the third most probable class'
'Mahalanobis distance for the most probable class'
'Normalized relative probability for the most probable class'
'Normalized relative probability for the second most probable class'
'Normalized relative probability for the third most probable class'

The FORTRAN format of these lines is as follows: '(a30,1x,3(a6,1x),f8.2,1x,3(f8.6,1x))'


Stellar variability classes considered (+code and number of definition light curves):
--------------------------------------------------------------------------------------


'PVSG'    Periodically variable supergiants (76)
'BE'      Variable Be-stars (57)
'BCEP'    Beta-Cephei stars (58)
'CLCEP'   Classical Cepheids (195)
'CP'      Chemically peculiar stars (63)
'DSCUT'   Delta-Scuti stars (139)
'ELL'     Ellipsoidal variables (16)
'GDOR'    Gamma-Doradus stars (35)
'HAEBE'   Herbig Ae/Be stars (21)
'XB'      X-ray binaries (9)
'LBOO'    Lambda-Bootis variables (13)
'LBV'     Luminous Blue variables (21)
'MIRA'    Mira variables (144)
'PTCEP'   Population II Cepheids (24)
'ROAP'    Rapidly Oscillating Ap stars (4)
'RRAB'    RR-Lyrae stars, subtype ab (129)
'RRC'     RR-Lyrae stars, subtype c (29)
'RVTAU'   RV-Tauri stars (13)
'SR'      Semi-regular variables (42)
'SPB'     Slowly Pulsating B-stars (47)
'SXPHE'   SX-Phe stars (7)
'TTAU'    T-Tauri stars (17)
'WR'      Wolf-Rayet stars (63)
'FUORI'   FU-Ori stars (3)
'SDBV'    Pulsating subdwarf B-stars (16)
'EA'      Eclipsing binaries, subtypes EA (169)
'EB'      Eclipsing binaries, subtypes EB (147)
'EW'      Eclipsing binaries, subtypes EW (59)
'RRD'     Double mode RR-Lyrae stars (57)
'DMCEP'   Double-mode Cepheids (95)
'SLR'     Solar-like oscillations in red-giants (1)
'DAB'     Pulsating DA white dwarfs (2)
'DBV'     Pulsating DB white dwarfs (1)
'GWVIR'   GW-Virginis stars (2)
'CV'      Cataclysmic variables (3)