Lazy Physics

John Jacobsen
Thursday, February 12, 2015

Home All Posts

Earlier post Fun with Instaparse code clojure

Later post Macro-writing Macros code clojure

… in which we explore lazy sequences and common functional idioms in Clojure via the example of looking for (nearly) coincident clusters of times in a series.

A fundamental technical problem in experimental particle physics is how to distinguish the signatures of particles from instrumental noise.

birds-on-wires.jpg

Imagine a tree full of hundreds of sparrows, each nesting on a branch, each chirping away occasionally. Suddenly, for a brief moment, they all start chirping vigorously (maybe a hawk flew past). A clustering of chirps in time is the signal that something has happened! The analogous situation occurs in instruments consisting of many similar detector elements, each generating some amount of random noise that, on its own, is indistinguishable from any evidence left by particles, but which, taken together, signals that, again, something has happened —a muon, an electron, a neutrino has left a sudden spume of electronic evidence in your instrument, waiting to be read out and distinguished from the endless noise.

This process of separating the noise from the signal is known in physics as triggering and is typically done through some combination of spatial or time clustering; in many cases, time is the simplest to handle and the first “line of defense” against being overrun by too much data. (It is often impractical to consume all the data generated by all the elements —data reduction is the name of the game at most stages of these experiments.)

This data is typically generated continously ad infinitum, and must therefore be processed differently than, say, a single file on disk. Such infinite sequences of data are an excellent fit for the functional pattern known as laziness, in which, rather than chewing up all your RAM and/or hard disk space, data is consumed and transformed only as needed / as available. This kind of processing is baked into Clojure at many levels and throughout its library of core functions, dozens of which can be combined (“composed”) to serve an endless variety of data transformations. (This style of data wrangling is also available in Python via generators and functional libraries such as Toolz.)

Prompted by a recent question on the topic from a physicist and former colleague, I got to thinking about the classic problem of triggering, and realized that the time series trigger provides a nice showcase for Clojure’s core library and for processing lazy sequences. The rest of this post will describe a simple trigger, essentially what particle astrophysicists I know call a “simple majority trigger”; or a “simple multiplicity trigger” (depending on whom you talk to).

Now for some Clojure code. (A small amount of familiarity with Clojure’s simple syntax is recommended for maximum understanding of what follows.) We will build up our understanding through a series of successively more complex code snippets. The exposition follows closely what one might do in the Clojure REPL, building up successively more complete examples. In each case, we use take to limit what would otherwise be infinite sequences of data (so that our examples can terminate without keeping us waiting forever…).

First we create a sorted, infinite series of ever-increasing times (in, say, nsec):

(def times (iterate #(+ % (rand-int 1000)) 0))
;; Caution: infinite sequence...

(take 30 times)

;;=>
(0 955 1559 2063 2735 2858 3542 4067 4366 5246 5430 6168 7127 7932
 8268 8929 9426 9918 10436 10850 11680 12367 12569 13343 14155 14420
 15062 15171 15663 16355)

times is an infinite (but “unrealized”) series, constructed by iterating the anonymous function #(+ % (rand-int 1000)) which adds a random integer from 0 to 999 to its argument (starting with zero). The fact that it is infinite does not prevent us from defining it or (gingerly) interrogating it via take1.

Now, the way we’ll look for excesses is to look for groupings of hits (say, eight of them) whose first and last hit times are within 1 microsecond (1000 nsec) of each other. To start, there is a handy function called partition which groups a series in blocks of fixed length:

(take 10 (partition 8 times))

;;=>
((0 955 1559 2063 2735 2858 3542 4067)
 (4366 5246 5430 6168 7127 7932 8268 8929)
 (9426 9918 10436 10850 11680 12367 12569 13343)
 (14155 14420 15062 15171 15663 16355 16700 16947)
 (17919 17949 18575 18607 18849 19597 20410 20680)
 (20737 21289 21315 21323 21426 21637 22422 23000)
 (23477 24351 24426 25106 25861 26568 27511 28332)
 (29071 29831 29957 30761 31073 31914 32591 33187)
 (33878 34739 34842 35674 36444 36960 36983 37400)
 (37587 38012 38969 39131 39317 40135 40587 40759))

We’ll rewrite this using Clojure’s thread-last macro ->>, which is a very helpful tool for rewriting nested expressions as a more readable pipeline of successive function applications:

(->> times
     (partition 8)
     (take 10))

;;=>
((0 955 1559 2063 2735 2858 3542 4067)
 (4366 5246 5430 6168 7127 7932 8268 8929)
 ...same as above...)

However, this isn’t quite what we want, because it won’t find clusters of times close together who don’t happen to begin on our partition boundaries. To fix this, we use the optional step argument to partition:

(->> times
     (partition 8 1)
     (take 10))

;;=>
((0 955 1559 2063 2735 2858 3542 4067)
 (955 1559 2063 2735 2858 3542 4067 4366)
 (1559 2063 2735 2858 3542 4067 4366 5246)
 (2063 2735 2858 3542 4067 4366 5246 5430)
 (2735 2858 3542 4067 4366 5246 5430 6168)
 (2858 3542 4067 4366 5246 5430 6168 7127)
 (3542 4067 4366 5246 5430 6168 7127 7932)
 (4067 4366 5246 5430 6168 7127 7932 8268)
 (4366 5246 5430 6168 7127 7932 8268 8929)
 (5246 5430 6168 7127 7932 8268 8929 9426))

This is getting closer to what we want—if you look carefully, you’ll see that each row consists of the previous one shifted by one element. The next step is to grab (via map) the first and last times of each group, using juxt to apply both first and last to each subsequence….

(->> times
     (partition 8 1)
     (map (juxt last first))
     (take 10))

;;=>
([4067 0]
 [4366 955]
 [5246 1559]
 [5430 2063]
 [6168 2735]
 [7127 2858]
 [7932 3542]
 [8268 4067]
 [8929 4366]
 [9426 5246])

… and turn these into time differences:

(->> times
     (partition 8 1)
     (map (comp (partial apply -) (juxt last first)))
     (take 10))

;;=>
(4067 3411 3687 3367 3433 4269 4390 4201 4563 4180)

Note that so far these time differences are all > 1000. comp, above, turns a collection of multiple functions into a new function which is the composition of these functions, applied successively one after the other (right-to-left). partial turns a function of multiple arguments into a function of fewer arguments, by binding one or more of the arguments in a new function. For example,

((partial + 2) 3)

;;=>
5

((comp (partial apply -) (juxt last first)) [3 10])

;;=>
7

Recall that we only want events whose times are close to each other; say, whose duration is under a maximum limit of 1000 nsec. In general, to select only the elements of a sequence which satisfy a filter function, we use filter:

(->> times
     (partition 8 1)
     (map (comp (partial apply -) (juxt last first)))
     (filter (partial > 1000))
     (take 10))

;;=>
(960 942 827 763 597 682 997 836 986 966)

((partial > 1000) is a function of one argument which returns true if that argument is strictly less than 1000.)

Great! We now have total “durations”; for subsequences of 8 times, where the total durations are less than 1000 nsec.

But this is not actually that helpful. It would be better if we could get both the total durations and the actual subsequences satisfying the requirement (the analog of this in a real physics experiment would be returning the actual hit data falling inside the trigger window).

To do this, juxt once again comes to the rescue, by allowing us to juxt-apose the original data alongside the total duration to show both together….

(->> times
     (partition 8 1)
     (map (juxt identity (comp (partial apply -) (juxt last first))))
     (take 10))

;;=>
([(0 309 410 562 979 1423 2180 3159) 3159]
 [(309 410 562 979 1423 2180 3159 3585) 3276]
 [(410 562 979 1423 2180 3159 3585 4325) 3915]
 [(562 979 1423 2180 3159 3585 4325 4573) 4011]
 [(979 1423 2180 3159 3585 4325 4573 5074) 4095]
 [(1423 2180 3159 3585 4325 4573 5074 5942) 4519]
 [(2180 3159 3585 4325 4573 5074 5942 6599) 4419]
 [(3159 3585 4325 4573 5074 5942 6599 7458) 4299]
 [(3585 4325 4573 5074 5942 6599 7458 8128) 4543]
 [(4325 4573 5074 5942 6599 7458 8128 8439) 4114])

… and adapt our filter slightly to apply our filter only to the time rather than the original data:

(->> times
     (partition 8 1)
     (map (juxt identity (comp (partial apply -) (juxt last first))))
     (filter (comp (partial > 1000) second))
     (take 3))

;;=>
([(1577315 1577322 1577514 1577570 1577793 1577817 1577870 1578151)
  836]
 [(3119967 3120203 3120416 3120469 3120471 3120620 3120715 3120937)
  970]
 [(6752453 6752483 6752522 6752918 6752966 6753008 6753026 6753262)
  809])

Finally, to turn this into a function for later use, use defn and remove take:

(defn smt-8 [times]
  (->> times
       (partition 8 1)
       (map (juxt identity (comp (partial apply -) (juxt last first))))
       (filter (comp (partial > 1000) second))))

smt-8 consumes one, potentially infinite sequence and outputs another, “smaller” (but also potentially infinite) lazy sequence of time-clusters-plus-durations, in the form shown above.

Some contemplation will suggest many variants; for example, one in which some number of hits outside the trigger “window” are also included in the output. This is left as an exercise for the advanced reader.

A “real” physics trigger would have to deal with many other details: each hit, in addition to its time, would likely have an amplitude, a sensor ID, and other data associated with it. Also, the data may not be perfectly sorted, some sensors may drop out of the data stream, etc. But in some sense this prototypical time clustering algorithm is one of the fundamental building blocks of experimental high energy physics and astrophysics and was used (in some variant) in every experiment I worked on over a 25+ year period. The representation above is certainly one of the most succinct, and shows off the power and elegance of the language, its core library, and lazy sequences. (It is also reasonably fast for such a simple algorithm; smt-8 consumes input times at a rate of about 250 kHz. This is not, however, fast enough for an instrument like IceCube, whose 5160 sensors each count at a rate of roughly 300 Hz, for a total rate of 1.5 MHz. A future post may look at ways to get better performance.)

Footnotes:

1

To model a Poisson process — one in which any given event time is independent of the future or past times — one would normally choose an exponential rather than a uniformly flat distribution of time differences, but this is not important for our discussion, so, in the interest of simplicity, we’ll go with what we have here.

Earlier post Fun with Instaparse code clojure

Later post Macro-writing Macros code clojure

Blog Posts (140)


Home

Reflections on a Year of Daily Memory Drawings art

Repainting art

Daily Memory Drawings art

Questions to Ask art

Macro-writing Macros code clojure

Fun with Instaparse code clojure

Nucleotide Repetition Lengths code clojure

Updating the Genome Decoder code clojure

Getting Our Hands Dirty (with the Human Genome) code clojure

Validating the Genome Decoder code clojure

A Two Bit Decoder code clojure

Exploratory Genomics with Clojure code clojure

Rosalind Problems in Clojure code clojure

Introduction to Context Managers in Python code python

Processes vs. Threads for Integration Testing code python

Resources for Learning Clojure code clojure

Continuous Testing in Python, Clojure and Blub code clojure python

Programming Languages code

Milvans and Container Malls southpole

Oxygen southpole

Ghost southpole

Turkey, Stuffing, Eclipse southpole

Wind Storm and Moon Dust southpole

Shower Instructions southpole

Fresh Air and Bananas southpole

Traveller and the Human Chain southpole

Reveille southpole

Drifts southpole

Bon Voyage southpole

A Nicer Guy? southpole

The Quiet Earth southpole

Ten southpole

ISO50 southpole art

In Defense of Hobbies misc code art

Closure southpole

Takeoff southpole

Mummification southpole

Eleventh Hour southpole

Diamond southpole

Baby, It’s Cold Outside southpole

Fruition southpole

Settling In southpole

Revolution Number Nine southpole

Red Eye to McMurdo southpole

Faults in Ice and Rock southpole

Bardo southpole

Chasing the Sun southpole

Downhole southpole

Coming Out of Hibernation southpole

Managing the Most Remote Data Center in the World code southpole

Photoshop on a Dime art

Man on Wire misc

Posing Rigs art

Metric art

Cuba southpole

Wickets southpole

Safe southpole

Broken Glasses southpole

End of the Second Act southpole

Pigs and Fish southpole

Last Arrivals southpole

Lily White southpole

In a Dry and Waterless Place southpole

Immortality southpole

Routine southpole

Tourists southpole

Passing Notes southpole

Translation southpole

The Usual Delays southpole

RNZAF southpole

CHC southpole

Wyeth on Another Planet art

Detox southpole

Packing southpole

Nails southpole

Gearing Up southpole

Gouache, and a new system for conquering the world art

Fall 2008 HPAC Studies art

YABP (Yet Another Blog Platform) southpole

A Bath southpole

Green Marathon southpole

Sprung southpole

Outta Here southpole

Lame Duck DAQer southpole

Eclipse Town southpole

One More Week southpole

IceCube Laboratory Video Tour; Midrats Finale southpole

SPIFF, Party, Shift Change southpole

Good things come in threes, or 18s southpole

Sleepless in the Station southpole

Post Deploy southpole

Midrats southpole

IceCube and The Beatles southpole

Video: Flight to South Pole southpole

The Pure Land southpole

Almost There southpole

There are no mice in the Hotel California Bunkroom southpole

Short Timer southpole

Sleepy in MacTown southpole

Superposition of Luggage States southpole

Sir Ed southpole

Shortcut to Toast southpole

Pynchon, Redux southpole

Flights: Round 1 southpole

Packing for the Pole southpole

Goals for Trip southpole

Balaklavas southpole

Tree and Man (Test Post) southpole

Schedule southpole

How to mail stuff to John at the South Pole southpole

Summer and Winter southpole

Homeward Bound southpole

Redeployment southpole

Short-timer southpole

The Cleanest Air in the World southpole

One more day (?) southpole

One more week (?) southpole

Closing Softly southpole

More Photos southpole

Super Bowl Wednesday southpole

Night Owls southpole

First Week southpole

More Ice Pix southpole

Settling In southpole

NPX southpole

Pole Bound southpole

Bad Dirt southpole

The Last Laugh southpole

Nope southpole

First Delay southpole

Batteries and Sheep southpole

All for McNaught southpole

The Big (Really really big…) Picture southpole

t=0 southpole

Giacometti southpole

Descent southpole

Video Tour southpole

How to subscribe to blog updates southpole

What The Blog is For southpole

Home