|
1 |
|
2 * Add more test cases. Categories we'd like to cover (with reasonably |
|
3 real-world tests, preferably not microbenchmarks) include: |
|
4 |
|
5 (X marks the ones that are fairly well covered now). |
|
6 |
|
7 X math (general) |
|
8 X bitops |
|
9 X 3-d (the math bits) |
|
10 - crypto / encoding |
|
11 X string processing |
|
12 - regexps |
|
13 - date processing |
|
14 - array processing |
|
15 - control flow |
|
16 - function calls / recursion |
|
17 - object access (unclear if it is possible to make a realistic |
|
18 benchmark that isolates this) |
|
19 |
|
20 I'd specifically like to add all the computer language shootout |
|
21 tests that Mozilla is using. |
|
22 |
|
23 * Normalize tests. Most of the test cases available have a repeat |
|
24 count of some sort, so the time they take can be tuned. The tests |
|
25 should be tuned so that each category contributes about the same |
|
26 total, and so each test in each category contributes about the same |
|
27 amount. The question is, what implementation should be the baseline? |
|
28 My current thought is to either pick some specific browser on a |
|
29 specific platform (IE 7 or Firefox 2 perhaps), or try to target the |
|
30 average that some set of same-generation release browsers get on |
|
31 each test. The latter is more work. IE7 is probably a reasonable |
|
32 normalization target since it is the latest version of the most |
|
33 popular browser, so results on this benchmark will tell you how much |
|
34 you have to gain or lose by using a different browser. |
|
35 |
|
36 * Instead of using the standard error, the correct way to calculate |
|
37 a 95% confidence interval for a small sample is the t-test. |
|
38 <http://en.wikipedia.org/wiki/Student%27s_t-test>. Basically this involves |
|
39 using values from a 2-tailed t-distribution table instead of 1.96 to |
|
40 multiply by the error function, a table is available at |
|
41 <http://www.medcalc.be/manual/t-distribution.php> |
|
42 |
|
43 * Add support to compare two different engines (or two builds of the |
|
44 same engine) interleaved. |
|
45 |
|
46 * Add support to compare two existing sets of saved results. |
|
47 |
|
48 * Allow repeat count to be controlled from the browser-hosted version |
|
49 and the WebKitTools wrapper script. |
|
50 |
|
51 * Add support to run only a subset of the tests (both command-line and |
|
52 web versions). |
|
53 |
|
54 * Add a profile mode for the command-line version that runs the tests |
|
55 repeatedly in the same command-line interpreter instance, for ease |
|
56 of profiling. |
|
57 |
|
58 * Make the browser-hosted version prettier, both in general design and |
|
59 maybe using bar graphs for the output. |
|
60 |
|
61 * Make it possible to track change over time and generate a graph per |
|
62 result showing result and error bar for each version. |
|
63 |
|
64 * Hook up to automated testing / buildbot infrastructure. |
|
65 |
|
66 * Possibly... add the ability to download iBench from its original |
|
67 server, pull out the JS test content, preprocess it, and add it as a |
|
68 category to the benchmark. |
|
69 |
|
70 * Profit. |