Friday, August 22, 2014

Add Z_VI and Z_PIPELINE as Additional Dimension Run

Data Statistics
  • Original Line Counts : 321729 
  • Column 5 to 14 are 10 columns with 5 dimensions and 5 errors (distance 5d and error 5d)
  • Column 20 to 39 are 20 columns with 10 dimensions and 10 errors (distance 10d and error 10d)
  • Column 18 and 19 are classification information (as by person or by machine)
  • Column 15 and 16 are two z values, denoted as z_vi and z_pipeline.
  • Column 17 is the error measurement for z value.
Testing Environment:
  • FutureGrid xRay for WDA-SMACOF
  • Local Machine for PCA
Testing Steps:
  1. Randomize the data.
  2. Clean the data by removing errors that are smaller than 0. (total size is now 321719)
  3. Clean the data by removing z_warning > 0 (total size is now 260789)
  4. Whiten the data by using (1 - mean) / std on all dimensions
  5. Take first 100k.
  6. Calculate the distance 5d plus z_vi (distance 6d), error 5d, distance 10d plus z_vi (distance 6d) and error 10d. The error is given as e(i, j) = sqrt(sum(error(i, k)^2 + error(j, k)^2)) where i and j are point i and point j, k is the dimension as from 1 to 5 or 1 to 10. And the weight can be calculated as: w(i, j) = 1 / e(i, j)^2.
    1. Note that for some of the cases, z_vi=-1, and since z_vi should always larger or at least close to zero, so all the z_vi = -1 case, the z_vi is replaced with values from z_pipeline.
    2. Scale the distance from z_vi column to 40. As for PCA and WDA-SMACOF, z_vi * sqrt(40)
  7. Apply WDA-SMACOF on 6d and 11d data using 1/error as weighting.
  8. Apply WDA-SMACOF on 6d and 11d data using Sammon's Mapping.
  9. Apply PCA on these dataset.
The Cluster Information:
  • Person
    • 1: QSO_BAL.
    • 2: STAR
    • 3: QSO
    • 4: GALAXY
    • 5: Unknown
    • 6: QSO?
    • 7: QSOz?
    • 8: NotQuasar
    • 9: NotInspected
    • 10: STAR?
  • Machine
    • 1: QSO
    • 2: STAR
    • 3: GALAXY

Testing Result (All the mappings are Person Classification, and with scaling, with z_pipeline replaced z_vi result):

1. WDA-SMACOF using 1/error as Weighting:

2. WDA-SMACOF using Sammon's Mapping:

3. PCA (100k data result, )
    1. 6d: 40.06115152 4.06885203 0.63975502 0.15969881 0.05026634 0.02027628
    2. 11d: 4.014067e+01 5.605559e+00 2.489067e+00 1.122345e+00 6.423549e-01 2.438484e-28 1.761722e-28 3.208711e-29 2.569735e-30 1.539248e-32 1.454250e-32

No comments:

Post a Comment