An Ostler in IT

Horses for Courses

Some trivial examples of using Clojure Transducers

TL;DR: first take on Clojure’s new Transducers

Introduction

Rich Hickey published a post a few days ago on the forthcoming Transducers.

Transducers should make their first appearance in Clojure 1.7.0 and the examples below were run using 1.7.0-alpha1

The first paragraph of Rich’s post describes Transducers as:

Transducers are a powerful and composable way to build algorithmic transformations that you can reuse in many contexts, and they’re coming to Clojure core and core.async.

Rich also says:

The other source of power comes from the fact that transducers compose using ordinary function composition.

Transducers follow on from Reducers, first announced by Rich in May 2012.

I wrote a post back in late 2012 called Some trivial examples of using Clojure Reducers in which I attempted to offer some practical, if contrived, examples of how to use reducers.

This post reprises the examples in the reducers post but now includes equivalent contrived transducers examples as well.

I’ve includes some material from the reducers post for completeness, but without any explanation, so you don’t have to flip between the two. But you may want to refer to the reducers post for reducers background.

The Code

The Code - repo is on Github

The repo with the example code can be found on Github. Its a Leiningen project.

The Code - running the examples

Run the examples in the usual way

1
lein run -m transducers-examples1

The Code - code is a tangled org-mode file

The source code and project.clj are generated (tangled) from the org-mode source of this post found in the doc folder.

The Code - misc

The reducers namespace is r in the examples below i.e.

1
[clojure.core.reducers :as r]

There is no need to require transducers.

The Examples Collection

Many of the following examples will use the same collection holding the population of a small village. The village has four families, some families have two parents, others one; and each family between one and three children. One family lives in the North, and one each in the South, East and West.

This is stylised, artificial and contrived data designed to be easily understandable to most and in no way intended to suggest any family structures, conventions or arrangements as socially preferable. Just saying.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
;; The Families in the Village

(def village
  [{:home :north :family "smith" :name "sue" :age 37 :sex :f :role :parent}
   {:home :north :family "smith" :name "stan" :age 35 :sex :m :role :parent}
   {:home :north :family "smith" :name "simon" :age 7 :sex :m :role :child}
   {:home :north :family "smith" :name "sadie" :age 5 :sex :f :role :child}

   {:home :south :family "jones" :name "jill" :age 45 :sex :f :role :parent}
   {:home :south :family "jones" :name "jeff" :age 45 :sex :m :role :parent}
   {:home :south :family "jones" :name "jackie" :age 19 :sex :f :role :child}
   {:home :south :family "jones" :name "jason" :age 16 :sex :f :role :child}
   {:home :south :family "jones" :name "june" :age 14 :sex :f :role :child}

   {:home :west :family "brown" :name "billie" :age 55 :sex :f :role :parent}
   {:home :west :family "brown" :name "brian" :age 23 :sex :m :role :child}
   {:home :west :family "brown" :name "bettie" :age 29 :sex :f :role :child}

   {:home :east :family "williams" :name "walter" :age 23 :sex :m :role :parent}
   {:home :east :family "williams" :name "wanda" :age 3 :sex :f :role :child}])

Example 1 - how many children in the village?

An obvious way to do this would be to map each child to the value 1, else 0. The total number of children is then the simple addition of the 1s & 0s results of the map operation.

Example 1a - using a reducer to count how many children in the village

The reducers way looks like this:

1
2
3
4
5
6
7
;; Example 1a - using a reducer to add up all the mapped values

(def ex1a-map-children-to-value-1 (r/map #(if (= :child (:role %)) 1 0)))

(r/reduce + 0 (ex1a-map-children-to-value-1 village))
;;=>
8

Example 1b - using a transducer to count how many children in the village

The transducers way looks very similar. Literally the only difference is to use core map rather than the reducers map.

First the transducer function ex1b-map-children-to-value-1 is created using the new arity for map that takes just the mapping function, no collection.

Then the transducer is used with the new transduce function to reduce the collection and return the answer. transduce take the transducer function as its first argument, then the “usual” reducer arguments of reducing function, initial value and collection.

1
2
3
4
5
6
7
8
9
10
11
;; Example 1b - using a transducer to add up all the mapped values

;; create the transducers using the new arity for map that
;; takes just the function, no collection

(def ex1b-map-children-to-value-1 (map #(if (= :child (:role %)) 1 0)))

;; now use transduce (c.f r/reduce) with the transducer to get the answer 
(transduce ex1b-map-children-to-value-1 + 0 village)
;;=>
8

Example 2 - how many children in the Brown family?

An obvious way to find how many children just in the Brown family would be to select ( filter) the members of the Brown family, and use the same map function - e.g. ex1a-map-children-to-value-1 - from Example 1 to count the children.

Example 2a - using a reducer to count the children in the Brown family

Along with map, reducers have a filter function that returns another function that can be used with reduce:

1
2
3
4
5
6
7
8
9
10
11
12
;; Example 2a - using a reducer to count the children in the Brown family

;; create the reducer to select members of the Brown family
(def ex2a-select-brown-family (r/filter #(= "brown" (string/lower-case (:family %)))))

;; compose a composite function to select the Brown family and map children to 1
(def ex2a-count-brown-family-children (comp ex1a-map-children-to-value-1 ex2a-select-brown-family))

;; reduce to add up all the Brown children
(r/reduce + 0 (ex2a-count-brown-family-children village))
;;=>
2

Its worth observing reducers reduce does not need to create any intermediate collections.

Example 2b - using a transducer to count the children in the Brown family

The transducer-aware core filter function can be used to select the Brown family members, while the transducer ex1b-map-children-to-value-1 can be used to map children to 1, else 0.

As with reducers, the two functions ex2b-select-brown-family and ex1b-map-children-to-value-1 can be composed together.

And as before, transduce is used to count (reduce) the number of children.

1
2
3
4
5
6
7
8
9
10
11
12
13
;; Example 2b - using a transducer to count the children in the Brown family

;; create the transducer filter to select members of the Brown family
(def ex2b-select-brown-family (filter #(= "brown" (string/lower-case (:family %)))))

;; compose a composite function to select the Brown family and map children to 1
;; NOTE: transducer comp functions are applied left-to-right
(def ex2b-count-brown-family-children (comp ex2b-select-brown-family ex1b-map-children-to-value-1))

;; transduce to add up all the Brown children
(transduce ex2b-count-brown-family-children + 0 village)
;;=>
2

Note there is a gotcha here. The functions in the composed transducer ex2b-count-brown-family-children are applied left-to-right not right-to-left as is usual with comp.

Although not explicitly stated in Rich’s post I guess transducers do not create intermediate collections either.

Example 3 - how many children’s names start with J?

We already know the answer: just the 3 children in the Jones family.

Algorithmically, this is a three step pipeline: filter on children, a filter on names beginning with “J” (or “j”) and finally count of how many (children) in the result.

Example 3a - using a reducer to count children with names beginning with J

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
;; Example 3a - using a reducer to count children with names beginning with J

;; select (filter) just the children
(def ex3a-select-children (r/filter #(= :child (:role %))))

;; select names beginning with "j"
(def ex3a-select-names-beginning-with-j (r/filter #(= "j" (string/lower-case (first (:name %))))))

;; In Example 1 we created the _map_ function
;; ex1a-map-children-to-value-1 to enable reduce to count the number
;; of children.

;; But the need to count the number of entries in a collection using
;; reduce, after a pipeline possibly involving many filters and
;; mappers, is a common one. This is straightforward to do, in the final
;; stage of the pipeline use a map function to transform each entry to
;; value 1.

;; map entries in a collection to 1
(def ex0a-map-to-value-1 (r/map (fn [v] 1)))

;; create the three step count-children-with-names-beginning-j function
(def ex3a-count-children-with-names-beginning-j (comp ex0a-map-to-value-1
                                                      ex3a-select-names-beginning-with-j
                                                      ex3a-select-children))

;; reduce the village with the ex32-count-children-with-names-beginning-j function
(r/reduce + 0 (ex3a-count-children-with-names-beginning-j village))
;; =>
3

Its worth labouring the point that composing the custom reducer count-children-with-names-beginning-js from individual filters and mappers is a very powerful technique.

Example 3b - using a transducer to count children with names beginning with J

As with transducers map, filter has a new arity, taking just the filtering function, no collection.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
;; Example 3b - using a transducer to count children with names beginning with J

;; select (filter) just the children
(def ex3b-select-children (filter #(= :child (:role %))))

;; select names beginning with "j"
(def ex3b-select-names-beginning-with-j (filter #(= "j" (string/lower-case (first (:name %))))))

;; map entries in a collection to 1
(def ex0b-map-to-value-1 (map (fn [v] 1)))

;; create the three step count-children-with-names-beginning-j function
;; note the left-to-right order in which the indivudal transducers
(def ex3b-count-children-with-names-beginning-j (comp ex3b-select-children
                                                      ex3b-select-names-beginning-with-j
                                                      ex0b-map-to-value-1))

;; transduce the village with the ex3b-count-children-with-names-beginning-j transducer
(transduce ex3b-count-children-with-names-beginning-j + 0 village)
;; =>
3

Again, note the ex3b-count-children-with-names-beginning-j transducer applies it constituent transducers left-to-right.

Example 4 - creating a collection of children whose names start with J?

Sometimes you will want the resulting collection itself, post map, filter, etc, and not reduced any further.

Example 4a - using a reducer to create a collection of children whose names start with J?

Since we want the actual entries, rather than count them, we need a pure filter pipeline, similar to Example 3 but one that doesn’t use the ex0a-map-to-value-1 mapper.

Creating a vector of the J children can be done simply by using into.

Under the covers into uses reduce so creating the vector is just a matter of applying the ex4a-select-children-with-names-beginning-with-j reducers to the village and then using the resulting collection with into to create the vector.

1
2
3
4
5
6
7
8
9
10
11
12
;; Example 4a - a reducer to create a collection of children whose names start with J?

;; create a reducing function to select (filter) children with names starting with "J"
(def ex4a-select-children-with-names-beginning-with-j (comp ex3a-select-names-beginning-with-j
                                                            ex3a-select-children))

;; use into to create a vector of the "J" children
(into [] (ex4a-select-children-with-names-beginning-with-j village))
;; =>
[{:age 19, :home :south, :name "jackie", :sex :f, :family "jones", :role :child}
 {:age 16, :home :south, :name "jason", :sex :f, :family "jones", :role :child}
 {:age 14, :home :south, :name "june", :sex :f, :family "jones", :role :child}]

Example 4b - using a transducer to create a collection of children whose names start with J?

Very similar to reducers, transducers can use into to create a collection.

Note into takes the transducer and the collection as arguments, not just the collection.

1
2
3
4
5
6
7
8
9
10
11
12
;; Example 4b - a transducer to create a collection of children whose names start with J?

;; create a reducing function to select (filter) children with names starting with "J"t
(def ex4b-select-children-with-names-beginning-with-j (comp ex3b-select-names-beginning-with-j
                                                            ex3b-select-children))

;; use into to create a vector of the "J" children
(into [] ex4b-select-children-with-names-beginning-with-j village)
;; =>
[{:age 19, :home :south, :name "jackie", :sex :f, :family "jones", :role :child}
 {:age 16, :home :south, :name "jason", :sex :f, :family "jones", :role :child}
 {:age 14, :home :south, :name "june", :sex :f, :family "jones", :role :child}]

Example 5 - calculate the average age of children on or below the equator

A more involved, but still straightforward, example to finish this section: what is the average age of the children who live on or below the equator? By equator I mean where home is East, South or West.

To do this, the value of home will be mapped to a latitude and longitude. For example West will be :lat 0 :lng -180 and South is :lat -90 :lng 0.

Example 5a - using a reducer to calculate the average age of children on or below the equator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
;; Example 5a - using a reducer to calculate the average age of children on or below the equator

;; map :home to latitude and longitude
(def ex5a-map-home-to-latitude-and-longitude
  (r/map
   (fn [v]
     (condp = (:home v)
       :north (assoc v :lat 90 :lng 0)
       :south (assoc v :lat -90 :lng 0)
       :west (assoc v :lat 0 :lng -180)
       :east (assoc v :lat 0 :lng 180)))))


;; select people on or below the equator i.e. latitude <= 0
(def ex5a-select-people-on-or-below-equator (r/filter #(>= 0 (:lat %))))

;; To find the average age, we need to add up all the children's ages and
;; divide by how many children.

;; Note, rather than creating a composite pipeline function, in this example
;; the individual stages of the pipeline are used explicitly.

;; count the number of children on or below the equator
(def ex5a-no-children-on-or-below-the-equator
  (r/reduce + 0
          (ex0a-map-to-value-1
           (ex5a-select-people-on-or-below-equator
            (ex5a-map-home-to-latitude-and-longitude
             (ex3a-select-children village))))))


;; sum the ages of children
(def ex5a-select-age (r/map #(:age %)))

(def ex5a-sum-of-ages-of-children-on-or-below-the-equator
  (r/reduce + 0
          (ex5a-select-age
           (ex5a-select-people-on-or-below-equator
            (ex5a-map-home-to-latitude-and-longitude
             (ex3a-select-children village))))))


;; calculate the average age of children on or below the equator
(def ex5a-averge-age-of-children-on-or-below-the-equator
  (float (/ ex5a-sum-of-ages-of-children-on-or-below-the-equator ex5a-no-children-on-or-below-the-equator )))
;; =>
17.3

Example 5b - using a transducer to calculate the average age of children on or below the equator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
;; Example 5b - using a transducer to calculate the average age of children on or below the equator

;; map :home to latitude and longitude
(def ex5b-map-home-to-latitude-and-longitude
  (map
   (fn [v]
     (condp = (:home v)
       :north (assoc v :lat 90 :lng 0)
       :south (assoc v :lat -90 :lng 0)
       :west (assoc v :lat 0 :lng -180)
       :east (assoc v :lat 0 :lng 180)))))


;; select people on or below the equator i.e. latitude <= 0
(def ex5b-select-people-on-or-below-equator (filter #(>= 0 (:lat %))))

;; create a "utility" transducer to select children on or below the equator
(def ex5b-select-children-on-or-below-the-equator
  (comp ex3b-select-children
        ex5b-map-home-to-latitude-and-longitude
        ex5b-select-people-on-or-below-equator)) 

;; create a transducer to count the number of children on or below the equator
(def ex5b-count-children-on-or-below-the-equator
  (comp ex5b-select-children-on-or-below-the-equator
        ex0b-map-to-value-1))

;; now count the number of children on or below the equator
(def ex5b-no-children-on-or-below-the-equator
  (transduce ex5b-count-children-on-or-below-the-equator + 0 village))

;; create a transducer to extract the age
(def ex5b-select-age (map #(:age %)))

;; create a transducer to extract the ages of all chilren
;; on or below the equator
(def ex5b-extract-ages-of-children-on-or-below-thew-equator
  (comp
   ex5b-select-children-on-or-below-the-equator
   ex5b-select-age))

;; now sum the ages
(def ex5b-sum-of-ages-of-children-on-or-below-the-equator
  (transduce ex5b-extract-ages-of-children-on-or-below-thew-equator + 0 village ))

;; calculate the average age of children on or below the equator
(def ex5b-averge-age-of-children-on-or-below-the-equator
  (float (/ ex5b-sum-of-ages-of-children-on-or-below-the-equator ex5b-no-children-on-or-below-the-equator)))
;; =>
17.3

Example 6 - comparing the performance of reducers and transducers

Lets time the addition of the ages from Example 5 using reducers reduce and fold, and transducers transduce.

Example 6a - time a reducer adding up Example 5’s ages

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
;; Example 6a - time a reducer adding up Example 5's ages

;; time reducer adding up Example 5's ages
(time (dotimes [n 100000] (r/reduce +
                                  (ex5a-select-age
                                   (ex5a-select-people-on-or-below-equator
                                    (ex5a-map-home-to-latitude-and-longitude
                                     (ex3a-select-children village)))))))
;; =>
"Elapsed time: ~700 msecs"

;; time fold adding up Example 5's ages
(time (dotimes [n 100000] (r/fold +
                               (ex5a-select-age
                                (ex5a-select-people-on-or-below-equator
                                 (ex5a-map-home-to-latitude-and-longitude
                                  (ex3a-select-children village)))))))
;; =>
"Elapsed time: ~700 msecs"

Its not possible to see the benefits on fold’s ability to parallelise on this volume of data (village).

Example 6b - time a transducer adding up Example 5’s ages

1
2
3
4
5
;; Example 6b - time a transducer adding up Example 5's ages

(time (dotimes [n 100000] (transduce ex5b-extract-ages-of-children-on-or-below-thew-equator + 0 village)))
;; =>
"Elapsed time: ~700 msecs"

So, in this example, its not possible to see any performance difference between reducers and transducers.

Example 7 - all the relatives visit the village!

Its that time of year again and all the relatives of the families in the village visit and the population of the village swells enormously to 10 million people.

Example 7 - make some visitors

Lets define some functions to create an influx of visitors. Note, no attempt has been made to ensure this randomly generated data makes any sort of real world sense - it could include e.g. a child of age 100.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
;; Example 7 - make some visitors

(def ex7-fn-random-name (fn [] (rand-nth ["chris" "jim" "mark" "jon" "lisa" "kate" "jay" "june" "julie" "laura"])))
(def ex7-fn-random-family (fn [] (rand-nth ["smith" "jones" "brown" "williams" "taylor" "davies"])))
(def ex7-fn-random-home (fn [] (rand-nth [:north :south :east :west])))
(def ex7-fn-random-sex (fn [] (rand-nth [:m :f])))
(def ex7-fn-random-role (fn [] (rand-nth [:child :parent])))
(def ex7-fn-random-age (fn [] (rand-int 100)))

(def ex7-visitor-template
  {:home ex7-fn-random-home
   :family ex7-fn-random-family
   :name ex7-fn-random-name
   :age ex7-fn-random-age
   :sex ex7-fn-random-sex
   :role ex7-fn-random-role})

(defn ex7-make-visitor [] (into {} (for [[k v] ex7-visitor-template] [k (v)])))

(defn ex7-make-visitors [n] (take n (repeatedly ex7-make-visitor)))

(def ex7-visitors (into [] (ex7-make-visitors 10000000)))

Example 7a - using reducers to count the visiting Brown children

1
2
3
4
5
6
7
8
9
10
11
;; Example 7a - using reducers count the visiting Brown children

;; count the visiting Brown children using reduce
(time (r/reduce + 0 (ex2a-count-brown-family-children ex7-visitors)))
;; =>
"Elapsed time: ~1600 msecs"

;; count the visiting Brown children using fold
(time (r/fold + (ex2a-count-brown-family-children ex7-visitors)))
;; =>
"Elapsed time: ~555 msecs"

Example 7b - using transducers to count the visiting Brown children

1
2
3
4
5
6
;; Example 7b - using reducers count the visiting Brown children

;; count the visiting Brown children using transduce
(time (transduce ex2b-count-brown-family-children + 0 ex7-visitors))
;; =>
"Elapsed time: ~1640 msecs"

Example 7c - using core map, filter and reduce to count the visiting Brown children

How do the reducers and transducers fare against core reduce?

1
2
3
4
5
6
7
8
9
;; Example 7c - using core map, filter and reduce to count the visiting Brown children

;; count the visiting Brown children using core map, filter and reduce
(time (reduce + 0
              (map #(if (= :child (:role %)) 1 0)
                   (filter #(= "brown" (string/lower-case (:family %))) ex7-visitors))))

;; =>
"Elapsed time: ~2000 msecs"

In the original reducers post I found reducers fold was nearly four times faster than reducers reduce. Here fold is about three times faster. (I’m using the same four core workstation.)

Transducers come out with around the same time as reducers reduce.

All these numbers should be taken with a large pinch of salt, they are just a “wet finger” and this is in no way a rigorous benchmark.

Final Words

I’ve only scratched the surface of transducers in this post of course and Rich’s post has opened only a crack in the door to understanding the potential of transducers; there will be more to come, appreciate and learn I’m sure.

So far I’ve found transducers more immediately understandable than I did reducers, maybe because I already have a reasonable grasp of the latter and some existing mental context to understand the former. It will be interesting to hear / learn how other people new to both grok transducers.

I’ve been doing a project using reducers fold for the parallisation benefits and have noticed I have needed to consciously mentally “switch” between the core world and reducers world. In that sense I think reducers have an “impedance mismatch” with the rest of core.

On the other hands, writing the examples above I’ve felt transducers are more “grounded” with core; indeed they are core (there is no transducers namespace).

To sum up my opinion in a pithy one-liner: transducers are reducers decomplected.