Saturday, March 12, 2011

Mathematical Interlude: Pseudo-Random Number Generators

Many of the projects that I develop for this site require the generation of test datasets.  Because real data has some degree of uncertainty in the measurements, good test data should include a sprinkling of randomness to test the robustness of analysis tools developed.  Therefore I feel the need to describe a little about the software tools for simulating randomness in data, pseudo-random number generators, or PRNGs.

I have encountered numerous pseudo-scientists who seem to latch onto any terminology scientists actually use with 'pseudo' in the name, such as pseudo-tensor, or pseudo-random number generator and try to leverage the terminology to insinuate the methodology is unreliable. 

Since pseudo-random number generators are routinely used by scientists in generating test datasets and testing their algorithms (and I will start using them in the next post), I felt the need to do a pre-emptive strike.

Why Do Scientists Need Random Number Generators?
As mentioned above, often times scientists need to develop software for data analysis to test their algorithms.  Sometimes they might add a little random noise to a simulation run to determine how these changes influence their results.  Sometimes they just want to create datasets of random noise to determine how such data might be distinguished from a real signal.

Why Are they Called PSEUDO-Random Number Generators (PRNG)?
  • While the sequences are not truly random, they do pass many of the tests for randomness
  • Initializing a PRNG with the same 'seed' makes the PRNG generate the same sequences of random numbers.  This is valuable for the development of reproducible test processes.
  • After a certain number of numbers are drawn from the generator with a given seed, the sequence will repeat.  The  length of this sequence is called its period.  Ideally, you want a PRNG with a LONG period.

How Good are Pseudo-Random Number Generators?
  • Modern PRNGs are very good, suitable for a wide variety of problem-solving and simulation applications.
  • The primary application for which PRNGs are NOT suitable is cryptography.  This is because the underlying engine of PRNGs is deterministic and it is, in principle, possible for a cracker to determine the underlying PRNG sequence with enough samples of cypher-text.
What PRNG is Used by the Python NumPy Module?

The Random module in Numpy uses the Mersenne Twister (Wikipedia) as its underlying engine for generating random numbers (Python Documentation, Statistics in Numpy & Scipy).  This is also the generator used in the standard Python library.  At one time, these PRNGs were different and this can generate some confusion in online searches.

How many random numbers are in the period used by the Mersenne Twister algorithm?
2**19937-1
>>> 2**19937-1
>>> 4315424797388162648055235516337919839053935043226711505165250541403330680137658
0911304513629318584665545269938257648835317902217334584413909528269154609168019007
8753437413962968019201144864809026614143184432769803000667281049840954515881760771
3296984376213462179039639134128520562761960051310664637664861599423667548653748024
1964350295935168662363909047948347692313978301377820785712419054474332844529183172
9732423108882650813216264694510777078122828294447750226804880578200287646593991647
6626520090056149580034405435369038986289406179287201112083361480844748291354732836
7277879565648307846909116945866230169702401260240187028746650033445774570315431292
9960251877807901193759028631710841496424733789862675033089613749057663409052895722
9001603800057163087519137397955504746815433325347499104624813250451634179655147057
5481459200859472614836213875557116864445789750886277996487304308450484223420629266
5185560243393391908443689210184248446770427276646018529149252772809226975384267702
5733392895440120546589561034765885538663390254628996213264328242574803578623358060
8154696546932563833327670769899439774888526687278527451002963059146963875715425735
5344759797344631006783673933274021499309687782967413915145996023742136298987206114
3141040214723899809096281891589064569393448333099416963229587799584899336674701487
1763494805549996163051541225403465297007721146231355704081493098663065733677191172
8539870957481678162560842128233801686253345864312540346708061352735432707144788768
6186198332077728064480669112571319726258176315131359642954776357636783701934983517
8462144294960757190918054625114143666384189433852576452289347652454631535740468786
2289458856546085620580424689873724369214450923153776984071681983765382377486141962
0704154810637936512319281799900662176646716711347163271548179587700538269439340040
3061700457691135349187874888923429349340145170571716181125795888889277495426977149
9145496239163940148229850253316515114312788020090568084565068188772666098316368838
8490562182226293398654864566908067219170474040889134983568566242806323119852043682
6329415290752972798343429446509992206368781367154091702655772727391329424277529349
0826005858847665231509574170778319100161684756856586731928608820701797603072698499
8735483604237173466025769434723550630174411887414129243895814154910060975221688223
0887611431996472330842380137110927449483557815037586849644585749917772869926744218
3696211376751010832785437940817490940910430840967741447084363242794768920562004272
2796163866914980548983112124467639993195537148401288636074870647956866904857478285
5217054740113945929622177502575565811067452201448981991968635965361551681273982740
7601388996388203187763036687627301575846400427988806918626402686126861808838749395
7381812502227968993026744625577395954246983163786300017127922715140603412990218157
0659650532600775823677398182129087394449859182749999007223592423334567850671186568
8391867477049600162775406253314406190191299837899147125153652003360579935086016788
0768756856237785709525554130490292719222018417250235712444991187021064269456506138
4919373474324503966267799038402386781686809962015879090586549423504699190743519551
0437225445157409678290843360259382257807308802738552615519720440756203267806244488
0349099823216123168779471561340579324954550952805251801012308725877897411581704824
5588971438596754408081313438375502988726739523375296641615501406091607983229239827
2406147832528924797165199369895191878086812211916417477109024806334910917048274412
2828118663244590714578713835123484226138007462191400481815238666604313334487506790
3582838283562688083236575482068479639546383819532174522502682372441363275765875609
1197836532983120667082171493167735643403792897243939867441398918554166122957393566
6861265827123469643837712283899804019973907806144367541567107846340467370240377765
3478173367084844734702056866636158138003692253382209909466469591930161626097920508
7421756703065051395428607508061598353575410321470950842784610567013677397949320242
0299870773101769258204621070221251412042932253043178961626704777611512359793540414
7084870985465426502772057300900333847905334250604119503030001704002887892941404603
3458699263675013550949427505525915816399805231906796107849935808966832992976812624
4231400865703342186809455174050644882903920731671130769513189229659350901862309481
0557519560305240787163809219164433754514863301000915916985856242176563624771328981
6785482462973762495302513603634127683664561750770319774575349128064331765399959943
4330811847014715871281614939442127661422826290995005574698105320661000156029578465
6616193252269412026831159508949671513845195883217147982748879261851417819979034417
2855986077272208666776804260903087546340467370240377765347817336708484473470205686
6636158138003692253382209909466469591930161626097920508742175670306505139542860750
8061598353575410321470950842784610567013677397949320242029987077310176925820462107
0221251412042932253043178961626704777611512359793540414708487098546542650277205730
0900333847905334250604119503030001704002887892941404603345869926367501355094942750
5525915816399805231906796107849935808966832992976812624423140086570334218680945517
4050644882903920731671130769513189229659350901862309481055751956030524078716380921
9164433754514863301000915916985856242176563624771328981678548246297376249530251360
3634127683664561750770319774575349128064331765399959943433081184701471587128161493
9442127661422826290995005574698105320661000156029578465661619325226941202683115950
8949671513845195883217147982748879261851417819979034417285598607727220866677680426
0903087548238033454465663056192413083744527546681430154877108777280110860043258922
6225941396828528349704557106275770142176156526272515340740762540514993198949445910
6414660534305378576709862520049864880961144869258603473714363659194013962706366851
3892996928694918051725568185082988249549548157960631695176587414201597987542734280
2672345248126356915730721315373978104162765371507859850415479728766312294671134815
8529418816432825044466692781137474494898385064375787507376496345148625306383391555
1456900878919553159944629444932352488175999071191357559333821217061914771850549366
3221115722292033114850248756330311801880568507356984158051811871077865395357129601
4372940865270407021924383167290323231567912289419486240594039074452321678019381871
2190921554607684445735785595136133042422061513564575139372709390097072378271012458
5383767833816102339758685489423069609154024998790745346131192396385295075475805820
5625956600817743007191746812655955021747670922460866747744520875607859062334750627
0983285934800677894561696024943928137634956575998474857735539909575573132008090408
3003644649221940993409694873054749430121616568675073574955588234030398987467297545
5060957736921559195480815514035915707129930057027117286252843197413312307617886797
5067842601954367603059903407084814646072789554954877421407535706212171982521929788
69786916734625618430175454903864111585429504569920905636741539030968041471L

(Wow!  Isn't Python's long integer type cool?)

That's 6002 digits, in case you were wondering, far more than a googol (Wikipedia), but smaller than a googolplex(Wikipedia).

If you called this generator one billion times per second (10^9 /second), you could run  the program for a year and you would have only used about 3.1x10^7 seconds * 10^9 seconds = 3.1 x 10^16 of the numbers in the cycle.  There is no concern of repetition of a cycle when using this PRNG.

A Simple Sample
To illustrate the use of the numpy random number generator, here's a simple python program that generates and plots 1500 random x,y pairs with values between zero and one. 

import numpy as np
from numpy import random
import matplotlib.pyplot as plt
points=random.uniform(0.0,1.0,size = (1500,2))
ax = plt.subplot(111,aspect='equal')
ax.scatter(points[:,0], points[:,1],color='r',marker='o',s=2,label='random')
plt.show() 
How many structures can YOU see in this plot?

For those who will still want to complain...
If you want to complain about PRNGs, then you must demonstrate that a true random number generator (TRNG) will generate significantly different results than a PRNG in the particular application.  This is the standard of evidence for anyone wishing to complain about the use of pseudo-random number generators.

Additional References
Next Weekend: Quantized Redshifts - Testing the Null Hypothesis

No comments: