- Original Line Counts : 321729
- Column 5 to 14 are 10 columns with 5 dimensions and 5 errors (distance 5d and error 5d)
- Column 20 to 39 are 20 columns with 10 dimensions and 10 errors (distance 10d and error 10d)
- Column 18 and 19 are classification information (as by person or by machine)
- Column 15 and 16 are two z values, denoted as z_vi and z_pipeline.
- Column 17 is the error measurement for z value.
- FutureGrid xRay for WDA-SMACOF
- Local Machine for PCA
- Randomize the data.
- Clean the data by removing errors that are smaller than 0. (total size is now 321719)
- Clean the data by removing z_warning > 0 (total size is now 260789)
- Whiten the data by using (1 - mean) / std on all dimensions
- Take first 100k.
- Calculate the distance 5d plus z_vi (distance 6d), error 5d, distance 10d plus z_vi (distance 6d) and error 10d. The error is given as e(i, j) = sqrt(sum(error(i, k)^2 + error(j, k)^2)) where i and j are point i and point j, k is the dimension as from 1 to 5 or 1 to 10. And the weight can be calculated as: w(i, j) = 1 / e(i, j)^2.
- Note that for some of the cases, z_vi=-1, and since z_vi should always larger or at least close to zero, so all the z_vi = -1 case, the z_vi is replaced with values from z_pipeline.
- Scale the distance from z_vi column to 40. As for PCA and WDA-SMACOF, z_vi * sqrt(40)
- Apply WDA-SMACOF on 6d and 11d data using 1/error as weighting.
- Apply WDA-SMACOF on 6d and 11d data using Sammon's Mapping.
- Apply PCA on these dataset.
- 1: QSO_BAL.
- 2: STAR
- 3: QSO
- 4: GALAXY
- 5: Unknown
- 6: QSO?
- 7: QSOz?
- 8: NotQuasar
- 9: NotInspected
- 10: STAR?
- 1: QSO
- 2: STAR
- 3: GALAXY
Testing Result (All the mappings are Person Classification, and with scaling, with z_pipeline replaced z_vi result):
1. WDA-SMACOF using 1/error as Weighting:
2. WDA-SMACOF using Sammon's Mapping:
3. PCA (100k data result, )
- 6d: 40.06115152 4.06885203 0.63975502 0.15969881 0.05026634 0.02027628
- 11d: 4.014067e+01 5.605559e+00 2.489067e+00 1.122345e+00 6.423549e-01 2.438484e-28 1.761722e-28 3.208711e-29 2.569735e-30 1.539248e-32 1.454250e-32