Friday, August 22, 2014

Initial Test of Quasar Data

Data Statistics
  • Original Line Counts : 321729 
  • Column 5 to 14 are 10 columns with 5 dimensions and 5 errors (distance 5d)
  • Column 20 to 39 are 20 columns with 10 dimensions and 10 errors (distance 10d)
  • Column 18 and 19 are classification information (as by person or by machine)
  • Column 15 and 16 are two z values, denoted as z_vi and z_pipeline.
  • Column 17 is the error measurement for z value.
Testing Environment:
  • FutureGrid xRay for WDA-SMACOF
  • Local Machine for PCA
Testing Steps:
  1. Randomize the data.
  2. Clean the data by removing errors that are smaller than 0. (total size is now 321719)
  3. Take first 100k.
  4. Calculate the distance 5d, error 5d, distance 10d and error 10d. The error is given as e(i, j) = sqrt(sum(error(i, k)^2 + error(j, k)^2)) where i and j are point i and point j, k is the dimension as from 1 to 5 or 1 to 10. And the weight can be calculated as: w(i, j) = 1 / e(i, j)^2
  5. Apply WDA-SMACOF on 5d and 10d data using with weighting.
The Cluster Information:
  • Person
    • 1: QSO_BAL.
    • 2: STAR
    • 3: QSO
    • 4: GALAXY
    • 5: Unknown
    • 6: QSO?
    • 7: QSOz?
    • 8: NotQuasar
    • 9: NotInspected
    • 10: STAR?
  • Machine
    • 1: QSO
    • 2: STAR
    • 3: GALAXY
The Experiment Result:
1. WDA-SMACOF without weighting


2. WDA-SMACOF with 1 / error as weighting


3. PCA
    1. 5d: 4.16070966 0.60714097 0.15921712 0.05227772 0.02065453
    2. 10d: 5.591213e+00 2.533358e+00 1.185368e+00 6.900616e-01 2.559190e-27 6.513998e-28 3.698124e-28 6.921356e-29 1.392885e-32 1.345212e-32


No comments:

Post a Comment