This shows you the differences between the selected revision and the current version of the page.
| ulevs4bsd | ulevs4bsd 2011/12/21 08:24 current | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== FreeBSD 4BSD vs ULE Scheduler ====== | ||
| + | |||
| + | There has recently been an extremely long thread //[[http://lists.freebsd.org/pipermail/freebsd-stable/2011-December/064773.html|SCHED_ULE should not be the default]]// spread between the [[http://lists.freebsd.org/pipermail/freebsd-stable|FreeBSD-stable]] and [[http://lists.freebsd.org/pipermail/freebsd-current|FreeBSD-current]] mailing lists. This thread contains lots of claims and counter-claims but very little hard evidence to back up the claims. | ||
| + | |||
| + | ===== Summary ===== | ||
| + | |||
| + | The better scheduler depends on both the number of processes and the working set size. | ||
| + | |||
| + | ===== Details ===== | ||
| + | |||
| + | The following represents the results of a synthetic benchmark run on a 16-core [[http://docs.oracle.com/cd/E19095-01/sfv890.srvr/index.html|SunFire V890]] server((16 1350MHz UltraSPARC-IV CPUs and 64GB RAM)) running FreeBSD 10-current((the CVS equivalent of r227746 with a few local mods provided by marius@ and mjacob@ to identify some issues with igsp(4) and schizo interrupt handling)). A dmesg can be found [[HERE|HERE]]. This is a server-grade NUMA SPARC system so the results aren't necessarily comparable with a multicore x86 desktop system. | ||
| + | |||
| + | The benchmark runs multiple copies (processes) of a core that just repeatedly cycles through an array of doubles (to provide a pre-defined working set size), incrementing them. The source code can be found at XXX. The tests were run in single-user mode using both the 4BSD and ULE schedulers. In each case, the test was run 5 times with 1, 2, 4, 6, 8, 10, 12, 14, 15, 16, 17, 18, 20, 24, 28, 31, 32, 33, 36, 40, 48, 56 and 64 processes and working-set sizes of: | ||
| + | * 1KB - Everything fits into L1 (5e8 cycles) | ||
| + | * 4MB - At least 2 processes fit into L2 (1.2e9 cycles) | ||
| + | * 32MB - All processes are cache-busting (1.6e9 cycles) | ||
| + | The number of iterations was chosen to so that a single process took approximately 20s. | ||