Xorshift RNGs

Description of a class of simple, extremely fast random number generators (RNGs) with periods 2 k − 1 for k = 32 , 64 , 96 , 128 , 160 , 192. These RNGs seem to pass tests of randomness very well.


Introduction
A xorshift random number generator (xorshift RNG) produces a sequence of 2 32 −1 integers x, or a sequence of 2 64 −1 pairs x,y, or a sequence of 2 96 −1 triples x,y,z, etc., by means of repeated use of a simple computer construction: exclusive-or (xor) a computer word with a shifted version of itself. In C, the basic operation is yˆ(y<<a) for shifts left, yˆ(y>>a) for shifts right. (In Fortran, a single form, ieor(y,ishft(y,a)) will produce the desired result, with negative a for shifts right, nonnegative for shifts left.) Combining such xorshift operations for various shifts and arguments provides extremely fast and simple RNGs that seem to do very well on tests of randomness. To give an idea of the power and effectiveness of xorshift operations, here is the essential part of a C procedure that, with only three xorshift operations per call, will provide 2 128 −1 random 32-bit integers, given four random seeds x,y,z,w: tmp=(xˆ(x<<15)); x=y; y=z; z=w; return w=(wˆ(w>>21))ˆ(tmpˆ(tmp>>4)); Such a procedure is very fast, typically over 200 million/second, and the resulting random integers pass all the tests of randomness that have been applied to them, particularly the"tough" tests in [3] and the new version of the Diehard Battery [2].
Background theory for establishing the periods and choices that provide a variety of xorshift RNGs with periods up to 2 160 are given here. Longer periods are available, for example, from multiply-with-carry RNGs, but they use integer multiplication and require keeping a (sometimes large) table of the most recently generated values. More detailed comparisons are given in the final, summary section.

Theory
A mathematical model for most RNGs can be put in the following form: We have a seed set Z made up of m-tuples (x 1 , x 2 , . . . , x m ), and a one-to-one function f ( ) on Z. Most commonly, Z is just a set of integers, but for better RNGs, it may be a set of pairs, triples, etc. If z is chosen uniformly and randomly from Z, then the output of the RNG is the is itself uniform over Z, as is f 2 (z); indeed, each element of the sequence f (z), f 2 (z), . . . is uniformly distributed over the seed set Z, but they are not independent.
For xorshift RNGs, the seed set Z is the set of 1×n binary vectors, β = (b 1 , b 2 , . . . , b n ), excluding the zero vector. Usually, n will be 32, 64, 96, etc., so that its elements can be made up by adjoining 32-bit computer words. The elements of the vectors β in Z are in the field {0, 1}, so that addition of binary vectors can be implemented by xor'ing the constituent 32-bit parts. For our xorshift RNG, we need an invertible function over Z, and for that we use a linear transformation over the binary vector space, characterized by a nonsingular n ×n binary matrix T . If β is a uniform random choice, (the seed), from Z, then each member of the sequence βT, βT 2 , βT 3 , . . . is also uniformly distributed over Z, so we have a sequence of ID, Identically Distributed, uniform elements from Z, but they are not IID, that is, Independent Identically Distributed. But it turns out here, as for many RNGs, functions of the ID elements often have distributions very close to those of the same functions of elements of an IID sequence. That is the remarkable property of certain choices of functions f ( ) over seed sets Z that justifies their usefulness in computers for the past fifty years.

Matrices T that generate all non-null binary vectors
First given in [1], here is the main result: Theorem In order that a nonsingular n ×n binary matrix T produce all possible non-null 1×n binary vectors in the sequence β, βT, βT 2 , . . . for every non-null initial 1×n binary vector β, it is necessary and sufficient that, in the group of nonsingular n ×n binary matrices, the order of T is 2 n −1 Proof: First, the necessity: If the period of βT, βT 2 , . . . is k = 2 n −1 then βT k = β for every 1×n binary vector β, so the null space of the matrix T k + I is the whole space, and thus T k + I must be the zero matrix, that is, T k = I . If T j = I for some j < k, then the period of βT, βT 2 , . . . would be less than 2 n −1. Then the sufficiency: If the order of T is k = 2 n − 1, then the matrices T, T 2 , T 3 , . . . , T k are nonsingular and distinct, and through the characteristic polynomial of T and Euclid's algorithm, each of them can be represented as a polynomial in T of degree < n. Since there are k = 2 n −1 non-null polynomials in T of degree < n, they must be, in some order, the distinct nonsingular matrices T, T 2 , . . . , T k . In particular, if a polynomial in T is a singular matrix, then it must reduce, through T 's characteristic polynomial, to the zero matrix. It follows that the the period of βT, βT 2 , . . . must k = 2 n −1, because βT j = β for some non-null β and j < k would mean that T j + I is singular.

Application to Xorshift RNGs
For a binary vector y, the operation yT in a computer is likely to be expensive unless the matrix T has a special form.
If L is the n×n binary matrix that effects a left shift of one position on a binary vector y, that is, L is all 0's except for 1's on the principal subdiagonal, then, with T = I + L a , the xorshift operation in C, yˆ(y<<a) produces the linear transformation yT : add, mod 2, the binary vector y to a left-shift-a version of itself. Similarly, if R is the right-shift-1 matrix, (the transpose of L), then the xorshift operation yˆ(y>>b) may be used to form yT , with T = I + R b . The n × n matrices I + L a and I + R b are nonsingular, since, for example, L n = 0 and thus the finite series I + L a + L 2a + L 3a + . . . is the inverse of (I + L a ). So an obvious candidate for a matrix that has order 2 n −1 is Unfortunately, when n is 32 or 64, no choices for a and b will provide such a T with the required order. (A preliminary test on candidates is to square T n times. If the result is not T , then T cannot have order 2 n −1. By representing the rows of a binary matrix as a computer word, then xoring all of the rows for which a (left) multiplying binary vector has 1's, very fast binary matrix products can be evaluated with simple C or Fortran programs, and it only takes a few minutes to check every possible a, b in T = (I + L a )(I + R b ).) However, there are many choices for a, b, c for which the matrices T = (I + L a )(I + R b )(I + L c ) have order 2 n − 1, when n = 32 or n = 64. There are 81 triples (a, b, c), a < c, for which the 32 × 32 binary matrix Of those 81 triples with a < c, the triple (c, b, a) also provides a full period T , making 162 in all. In addition, another 162 full-period T 's arise from taking transposes: use triples (a, b, c) to form T = (I + R c )(I + L b )(I + R c ), for a total of 324. Then, finally, for each of the 324 T 's of the form (I + L a )(I + R b )(I + L c ) or (I + R a )(I + L b )(I + R c ), the corresponding matrices (I + L a )(I + L c )(I + R b ), and (I + R a )(I + R c )(I + L b ), also turn out to have period 2 32 −1, making a total of 648 choices. In summary, for each of the above 81 triples a, b, c with a < c, any one of these eight lines of C can serve as the basis for a 32-bit xorshift RNG with period 2 32 −1,: For 64-bit integers, the following 275 triples provide 64×64 matrices T = (I + L a )(I + R b )(I + L c ) whose order is 2 64 −1: As with the 32-bit case, a selection of any one of the 275 a, b, c choices for 64-bit sequences, and any one of the above eight lines of C code, will provide, for 64-bit words, a xorshift RNG with period 2 64 − 1, for a total of 8×275 = 2200 choices.
Here is a basic 32-bit xorshift C procedure that takes a 32-bit seed value y: unsigned long xor(){ static unsigned long y=2463534242; yˆ=(y<<13); y=(y>>17); return (yˆ=(y<<5)); } It uses one of my favorite choices, [a, b, c] = [13, 17, 5], and will pass almost all tests of randomness, except the binary rank test in Diehard [2]. (A long period xorshift RNG necessarily uses a nonsingular matrix transformation, so every successive n vectors must be linearly independent, while truly random binary vectors will be linearly independent only some 30% of the time.) Although I have only tested a few of them, any one of the 648 choices above is likely to provide a very fast, simple, high quality RNG.
For C compilers that have 64-bit integers, the following will provide an excellent period 2 64 −1 RNG, given a 64-bit seed x: unsigned long long xor64(){ static unsigned long long x=88172645463325252LL; xˆ=(x<<13); xˆ=(x>>7); return (xˆ=(x<<17)); but any of the above 2200 choices is likely to do as well. While it is convenient to use a 32-bit computer word to represent an element of a vector space of dimension 32, or dimension 64 for compilers that allow 64-bit integers, to get longer xorshift periods we need methods for representing elements of vector spaces of higher dimensions. A good way to do this is to allow, say, 1×96 vectors made up of 32-bit components (x, y, z) or 1×128 vectors with 32-bit components (x, y, z, w), etc. We are then faced with the problem of choosing matrices T that define linear transformations over such vector spaces.
Long periods may also be found by choosing the last column of the block companion matrix to be I + L a , I + R b , I + L c , I + R d as needed. Examples of the C generating code: t=(xˆ(x<<3))ˆ(yˆ(y>>19))ˆ(zˆ(z<<6));x=y;y=z;return(z=t); period 2 96 −1. t=(xˆ(x<<20))ˆ(yˆ(y>>11))ˆ(zˆ(z<<27))ˆ(wˆ(w>>6));x=y;y=z;z=w;return(w=t); period 2 128 −1. All of the above procedures return a 32-bit integer, although the procedures create sequences of pairs (x, y), or triples (x, y, z) or quadruples (z, y, z, w) of the maximal period, 2 n − 1, over binary vector spaces of dimension n = 64, 96, 128. For all of the above choices of parameters, the resulting xorshift RNGs pass all the tests in Diehard, and are exceptionally fast, all taking 4-6 nanoseconds, or about 200 million random numbers per second. They are so fast that a dominant part is the linkage: saving before-and restoring after-the registers used in the subroutine call.
Note that in the above, the last value: y in (x, y), z in (x, y, z), w in (x, y, z, w), is the returned value, but it need not be. If it is merely assigned, the full sequence of pairs, triples, quadruples will still be generated, but any function of those values could be returned. This provides a variety of possibilities for interesting RNGs. For example, merely returning 69069*x would provide a sort of congruential RNG with period 2 n −1, while a period of 2 32 (2 n −1) will result from combining (+ or xor) the regular xorshift with what I call a Weyl sequence: d+=362437; period 2 32 (with any odd constant replacing 362437). Here is an example: unsigned long xorwow(){ static unsigned long x=123456789,y=362436069,z=521288629, w=88675123,v=5783321, d=6615241; unsigned long t; t=(xˆ(x>>2)); x=y; y=z; z=w; w=v; v=(vˆ(v<<4))ˆ(tˆ(t<<1)); return (d+=362437)+v; } Simple and very fast (125 million/sec), the elements in its cycle of 2 192 −2 32 easily pass all the tests in Diehard.

Summary
A variety of simple and extremely fast RNGs can be developed by combining xorshift operations in different ways, yet in spite of their simplicity, the random numbers they produce do extremely well in tests of randomness. For the hundreds of choices of [a, b, c] given above, the simplest ones use the basic C operation xˆ=(x<<a);xˆ=(x>>b);(xˆ=(x<<c); to produce periods 2 32 −1 for 32-bit words, or period 2 64 −1 for 64-bit words. They are suitable by themselves or for use in combination with other methods.
A period of 2 32 is considered low by modern standards. With very little additional computer time, xorshift operations can be used to provide sequences of pairs (x, y) of period 2 64 − 1, triples (x, y, z) of period 2 96 − 1, or quadruples (x, y, z, w) of period 2 128 −1 or quintuples (x, y, z, w, v), period 2 192 −1. Extensions to higher k-tuples are feasible, but the general procedure: compute the new value as a function of the previous k values, then promote: x=y;y=z;z=w; etc. becomes less efficient than methods that keep a circular table of the k previously generated values. With such tables, the multiply-with-carry and complimentary-multiply-with-carry methods can provide periods as large as 2 131102 . I have described various versions through newsgroup postings, and a general description is in [4].
Both use just a few C instructions; the xorshift RNG has to keep the four most recent values; the MWC has to keep the three most recent as well as the latest carry c.