SunSpider/TODO
changeset 0 4f2f89ce4247
equal deleted inserted replaced
-1:000000000000 0:4f2f89ce4247
       
     1 
       
     2 * Add more test cases. Categories we'd like to cover (with reasonably
       
     3   real-world tests, preferably not microbenchmarks) include:
       
     4 
       
     5   (X marks the ones that are fairly well covered now).
       
     6 
       
     7     X math (general)
       
     8     X bitops
       
     9     X 3-d (the math bits)
       
    10     - crypto / encoding
       
    11     X string processing
       
    12     - regexps
       
    13     - date processing
       
    14     - array processing
       
    15     - control flow
       
    16     - function calls / recursion
       
    17     - object access (unclear if it is possible to make a realistic 
       
    18       benchmark that isolates this)
       
    19 
       
    20   I'd specifically like to add all the computer language shootout
       
    21   tests that Mozilla is using.
       
    22 
       
    23 * Normalize tests. Most of the test cases available have a repeat
       
    24   count of some sort, so the time they take can be tuned. The tests
       
    25   should be tuned so that each category contributes about the same
       
    26   total, and so each test in each category contributes about the same
       
    27   amount. The question is, what implementation should be the baseline?
       
    28   My current thought is to either pick some specific browser on a
       
    29   specific platform (IE 7 or Firefox 2 perhaps), or try to target the
       
    30   average that some set of same-generation release browsers get on
       
    31   each test. The latter is more work. IE7 is probably a reasonable
       
    32   normalization target since it is the latest version of the most
       
    33   popular browser, so results on this benchmark will tell you how much
       
    34   you have to gain or lose by using a different browser.
       
    35 
       
    36 * Instead of using the standard error, the correct way to calculate
       
    37   a 95% confidence interval for a small sample is the t-test. 
       
    38   <http://en.wikipedia.org/wiki/Student%27s_t-test>. Basically this involves
       
    39   using values from a 2-tailed t-distribution table instead of 1.96 to
       
    40   multiply by the error function, a table is available at
       
    41   <http://www.medcalc.be/manual/t-distribution.php>
       
    42 
       
    43 * Add support to compare two different engines (or two builds of the
       
    44   same engine) interleaved.
       
    45 
       
    46 * Add support to compare two existing sets of saved results.
       
    47 
       
    48 * Allow repeat count to be controlled from the browser-hosted version
       
    49   and the WebKitTools wrapper script.
       
    50 
       
    51 * Add support to run only a subset of the tests (both command-line and
       
    52   web versions).
       
    53 
       
    54 * Add a profile mode for the command-line version that runs the tests
       
    55   repeatedly in the same command-line interpreter instance, for ease
       
    56   of profiling.
       
    57 
       
    58 * Make the browser-hosted version prettier, both in general design and
       
    59   maybe using bar graphs for the output.
       
    60 
       
    61 * Make it possible to track change over time and generate a graph per
       
    62   result showing result and error bar for each version.
       
    63 
       
    64 * Hook up to automated testing / buildbot infrastructure.
       
    65 
       
    66 * Possibly... add the ability to download iBench from its original
       
    67   server, pull out the JS test content, preprocess it, and add it as a
       
    68   category to the benchmark.
       
    69 
       
    70 * Profit.