## Friday, August 22, 2014

### Heatmaps for Unweighted 5d and 10d DA-SMACOF

The following presents the heatmaps for the 5d and 10d data after being mapped by unweighted DA-SMACOF.

• 5 Dimensional Data • 10 Dimensional Data ### Add Z_VI and Z_PIPELINE as Additional Dimension Run

Data Statistics
• Original Line Counts : 321729
• Column 5 to 14 are 10 columns with 5 dimensions and 5 errors (distance 5d and error 5d)
• Column 20 to 39 are 20 columns with 10 dimensions and 10 errors (distance 10d and error 10d)
• Column 18 and 19 are classification information (as by person or by machine)
• Column 15 and 16 are two z values, denoted as z_vi and z_pipeline.
• Column 17 is the error measurement for z value.
Testing Environment:
• FutureGrid xRay for WDA-SMACOF
• Local Machine for PCA
Testing Steps:
1. Randomize the data.
2. Clean the data by removing errors that are smaller than 0. (total size is now 321719)
3. Clean the data by removing z_warning > 0 (total size is now 260789)
4. Whiten the data by using (1 - mean) / std on all dimensions
5. Take first 100k.
6. Calculate the distance 5d plus z_vi (distance 6d), error 5d, distance 10d plus z_vi (distance 6d) and error 10d. The error is given as e(i, j) = sqrt(sum(error(i, k)^2 + error(j, k)^2)) where i and j are point i and point j, k is the dimension as from 1 to 5 or 1 to 10. And the weight can be calculated as: w(i, j) = 1 / e(i, j)^2.
1. Note that for some of the cases, z_vi=-1, and since z_vi should always larger or at least close to zero, so all the z_vi = -1 case, the z_vi is replaced with values from z_pipeline.
2. Scale the distance from z_vi column to 40. As for PCA and WDA-SMACOF, z_vi * sqrt(40)
7. Apply WDA-SMACOF on 6d and 11d data using 1/error as weighting.
8. Apply WDA-SMACOF on 6d and 11d data using Sammon's Mapping.
9. Apply PCA on these dataset.
The Cluster Information:
• Person
• 1: QSO_BAL.
• 2: STAR
• 3: QSO
• 4: GALAXY
• 5: Unknown
• 6: QSO?
• 7: QSOz?
• 8: NotQuasar
• 9: NotInspected
• 10: STAR?
• Machine
• 1: QSO
• 2: STAR
• 3: GALAXY

Testing Result (All the mappings are Person Classification, and with scaling, with z_pipeline replaced z_vi result):

1. WDA-SMACOF using 1/error as Weighting:

2. WDA-SMACOF using Sammon's Mapping:

3. PCA (100k data result, )
1. 6d: 40.06115152 4.06885203 0.63975502 0.15969881 0.05026634 0.02027628
2. 11d: 4.014067e+01 5.605559e+00 2.489067e+00 1.122345e+00 6.423549e-01 2.438484e-28 1.761722e-28 3.208711e-29 2.569735e-30 1.539248e-32 1.454250e-32

### Initial Test of Quasar Data

Data Statistics
• Original Line Counts : 321729
• Column 5 to 14 are 10 columns with 5 dimensions and 5 errors (distance 5d)
• Column 20 to 39 are 20 columns with 10 dimensions and 10 errors (distance 10d)
• Column 18 and 19 are classification information (as by person or by machine)
• Column 15 and 16 are two z values, denoted as z_vi and z_pipeline.
• Column 17 is the error measurement for z value.
Testing Environment:
• FutureGrid xRay for WDA-SMACOF
• Local Machine for PCA
Testing Steps:
1. Randomize the data.
2. Clean the data by removing errors that are smaller than 0. (total size is now 321719)
3. Take first 100k.
4. Calculate the distance 5d, error 5d, distance 10d and error 10d. The error is given as e(i, j) = sqrt(sum(error(i, k)^2 + error(j, k)^2)) where i and j are point i and point j, k is the dimension as from 1 to 5 or 1 to 10. And the weight can be calculated as: w(i, j) = 1 / e(i, j)^2
5. Apply WDA-SMACOF on 5d and 10d data using with weighting.
The Cluster Information:
• Person
• 1: QSO_BAL.
• 2: STAR
• 3: QSO
• 4: GALAXY
• 5: Unknown
• 6: QSO?
• 7: QSOz?
• 8: NotQuasar
• 9: NotInspected
• 10: STAR?
• Machine
• 1: QSO
• 2: STAR
• 3: GALAXY
The Experiment Result:
1. WDA-SMACOF without weighting

2. WDA-SMACOF with 1 / error as weighting

3. PCA
1. 5d: 4.16070966 0.60714097 0.15921712 0.05227772 0.02065453
2. 10d: 5.591213e+00 2.533358e+00 1.185368e+00 6.900616e-01 2.559190e-27 6.513998e-28 3.698124e-28 6.921356e-29 1.392885e-32 1.345212e-32