FCL/sf/mw/qtwebkit: comparison SunSpider/TODO

equal deleted inserted replaced

--1:000000000000
+:4f2f89ce4247
+* Add more test cases. Categories we'd like to cover (with reasonably
+real-world tests, preferably not microbenchmarks) include:
+(X marks the ones that are fairly well covered now).
+X math (general)
+X bitops
+X 3-d (the math bits)
+- crypto / encoding
+X string processing
+- regexps
+- date processing
+- array processing
+- control flow
+- function calls / recursion
+- object access (unclear if it is possible to make a realistic
+benchmark that isolates this)
+I'd specifically like to add all the computer language shootout
+tests that Mozilla is using.
+* Normalize tests. Most of the test cases available have a repeat
+count of some sort, so the time they take can be tuned. The tests
+should be tuned so that each category contributes about the same
+total, and so each test in each category contributes about the same
+amount. The question is, what implementation should be the baseline?
+My current thought is to either pick some specific browser on a
+specific platform (IE 7 or Firefox 2 perhaps), or try to target the
+average that some set of same-generation release browsers get on
+each test. The latter is more work. IE7 is probably a reasonable
+normalization target since it is the latest version of the most
+popular browser, so results on this benchmark will tell you how much
+you have to gain or lose by using a different browser.
+* Instead of using the standard error, the correct way to calculate
+a 95% confidence interval for a small sample is the t-test.
+<http://en.wikipedia.org/wiki/Student%27s_t-test>. Basically this involves
+using values from a 2-tailed t-distribution table instead of 1.96 to
+multiply by the error function, a table is available at
+<http://www.medcalc.be/manual/t-distribution.php>
+* Add support to compare two different engines (or two builds of the
+same engine) interleaved.
+* Add support to compare two existing sets of saved results.
+* Allow repeat count to be controlled from the browser-hosted version
+and the WebKitTools wrapper script.
+* Add support to run only a subset of the tests (both command-line and
+web versions).
+* Add a profile mode for the command-line version that runs the tests
+repeatedly in the same command-line interpreter instance, for ease
+of profiling.
+* Make the browser-hosted version prettier, both in general design and
+maybe using bar graphs for the output.
+* Make it possible to track change over time and generate a graph per
+result showing result and error bar for each version.
+* Hook up to automated testing / buildbot infrastructure.
+* Possibly... add the ability to download iBench from its original
+server, pull out the JS test content, preprocess it, and add it as a
+category to the benchmark.
+* Profit.