CrossClj

1.0.2 docs

SourceDocs



RECENT
    VARS
    any
    assert-compiled-fold
    chunk
    chunk-combiner
    chunk-reducer
    compile-fold
    count
    deftransform
    deftransform*
    defwraptransform
    empty?
    every?
    extremum
    facet
    filter
    fold
    frequencies
    fuse
    group-by
    into
    keep
    map
    mapcat
    max
    min
    not-every?
    post-combine
    range
    reduce
    remove
    replace
    set
    some
    take
    tesser
    transform
    wrap-transform

    « Index of all namespaces of this project

    The essential folds: map, mapcat, take, filter, some, any?,
    into, etc, plus common fold combinators.
    
    > "Now we will tesser, we will wrinkle again. Do you understand?" "No,"
    > Meg said flatly. Mrs. Whatsit sighed. "Explanations are not easy when they
    > are about things for which your civilization still has no words. Calvin
    > talked about traveling at the speed of light. You understand that, little
    > Meg?" "Yes," Meg nodded. "That, of course, is the impractical, long way
    > around. We have learned to take short cuts wherever possible." "Sort of
    > like in math?" Meg asked. "Like in math."
    >
    > -- Madeline L'Engle, *A Wrinkle In Time*.
    
    Tesser structures partly-concurrent folds.
    
    `(tesser some-collections a-fold)` uses a-fold to combine some information
    from every element in some-collections: a collection of collections of
    inputs. Like reducers and transducers, it takes *fold* as the ultimate
    collection operation. Unlike transducers, Tesser folds are not sequential.
    They include an explicitly concurrent reduction over arbitrary partitions of
    the collection, and a sequential reduction over *those* reduced values,
    termed *combine*. Like transducers, Tesser folds can also map and filter
    their inputs, and perform post-processing transformation after each reduction
    and the combine step.
    
    Tesser folds can be composed with the standard collection threading idioms,
    and evaluated against a collection of collections using `(tesser colls
    fold)`.
    
        (->> (t/map inc)
             (t/filter even?)
             (t/fold +)
             (t/tesser [[1 2 3] [4 5 6]]))
        ; => 2 + 4 + 6 = 12
    (any)(any fold__17031__auto__)
    Returns any single input from the collection. O(chunks).
    For instance:
    
        (t/tesser [[1 2 3] [4 5 6]] (t/any))
        ; => 4
    (assert-compiled-fold f)
    Is this a valid compiled fold? Throws with a helpful message if the fold is
    invalid.
    Private
    (chunk-combiner {:keys [combiner]})
    Given a compiled fold, returns a reducing function that merges new
    post-reduced values into an accumulator combined, iff the current
    accumulator is not already reduced.
    Private
    (chunk-reducer {:keys [reducer reducer-identity post-reducer]})
    Given a compiled fold, constructs a function which takes a chunk and returns
    its post-reduced value.
    (compile-fold fold)
    Compiles a fold (a sequence of transforms, each represented as a function
    taking the next transform) to a single map like
    
        {:reducer-identity  (fn [] ...),
         :reducer           (fn [acc x] ...)
         ...}
    (count)(count fold__17031__auto__)
    How many inputs are there?
    For instance:
    
        (->> (t/filter even?)
             (t/count)
             (t/tesser [[1 2 3] [4 5 6]]))
        ; => 3
    macro
    (deftransform & args)
    Deftransform, assuming transforms should be appended to the end of the fold;
    e.g. innermost.
    macro
    (deftransform* conjoiner name docstring args & body)
    We're trying to build functions that look like...
    
        (defn map
          "Takes a function f and an optional fold. Returns a version of the
          fold which finally calls (f element) to transform each element."
          ([f]
           (map f []))
          ([f fold]
           (append fold
                   (fn build [{:keys [reducer] :as downstream}]
                     (assoc downstream :reducer
                            (fn reducer [acc input] (reducer acc (f input)))))))))
    
    Which involves a fair bit of shared boilerplate: the single-arity variant of
    the transform, the append/prepend logic, the annealing function and its
    destructuring bind, etc. We'll wrap these up in an anaphoric macro called
    deftransform, which takes a function (e.g. append) to conjoin this
    transform with the fold. Within the body, reducer-identity-, reducer-,
    post-reducer-, combiner-identity-, combiner-, post-combiner- are all
    bound to the downstream transform's component functions, and downstream is
    bound to the downstream transform itself.
    macro
    (defwraptransform & args)
    Like deftransform, but prepends the given transform to the beginning of the
    fold; e.g. outermost.
    (empty? & [f])
    Returns true iff no inputs arrive; false otherwise.
    For instance:
    
        (t/tesser [[]] (t/empty?))
        ; => true
    (every? pred & [f])
    True iff every input satisfies the given predicate, false otherwise. For
    instance:
    
        (t/tesser [[1 3 5]] (t/every? odd?))
        ; => true
    (extremum compare)(extremum compare fold__17031__auto__)
    Finds the largest element using a comparison function, e.g. compare.
    For example:
    
        (t/tesser [[5 4] [3 2] [1 0]] (t/extremum compare))
        ; => 5
    (facet)(facet fold__17031__auto__)
    Your inputs are maps, and you want to apply a fold to each value
    independently. Facet generalizes a fold over a single value to operate on
    maps of keys to those values, returning a map of keys to the results of the
    fold over all values for that key. Each key gets an independent instance of
    the fold.
    
    For instance, say you have inputs like
    
        {:x 1, :y 2}
        {}
        {:y 3, :z 4}
    
    Then the fold
    
        (->> (facet)
             (mean))
    
    returns a map for each key's mean value:
    
        {:x 1, :y 2, :z 4}
    (filter pred)(filter pred fold__17031__auto__)
    Takes a predicate function pred and an optional fold. Returns a version of
    the fold which only passes on inputs to subsequent transforms when (pred
    input) is truthy.
    
        (->> (t/filter odd?)
             (t/into [])
             (t/tesser [[1 2 3 4 5 6]]))
         ; => [1 3 5]
    (fold m)(fold m fold__17031__auto__)
    An arbitrary terminal fold. Takes a compiled fold and yields an uncompiled
    fold that can be composed with the usual Tesser transforms (map, filter,
    etc), or passed directly to tesser. For instance, a sum of numbers:
    
    (->> (t/fold {:reducer-identity  (constantly 0)
                  :reducer           +
                  :post-reducer      identity
                  :combiner-identity (constantly 0)
                  :combiner          +
                  :post-combiner     identity})
         (t/tesser [[5 6 7] [] [8]]))
    ; => 26
    
    Fold has several shortcuts to make defining folds more concise:
    
    - :reducer: if m is a function, not a map, defaults to m.
    - :combiner: defaults to :reducer
    - :reducer-identity: defaults to :identity, or else :reducer
    - :combiner-identity: defaults to :identity, or else :combiner
    - :post-reducer: defaults to :reducer, or identity if :reducer has no
      unary arity.
    - :post-combiner defaults to :combiner, or identity if :combiner has
      no unary arity.
    
    So we can find a sorted set of all inputs using:
    
    (->> (t/fold {:identity sorted-set
                  :reducer conj
                  :combiner into})
         (t/tesser [[1 2] [2 3]]))
    ; => #{1 2 3}
    
    And if we provide a function instead of a map, Tesser interprets it using the
    transducer-style arities: (m) for identity, `(m acc)` for
    post-reducer/post-combiner, `(m acc input)` for reducer/combiner, etc.
    
    (t/tesser [[1 2 3] [4 5 6]] (t/fold +))
    ; => 21
    (frequencies)(frequencies fold__17031__auto__)
    Like clojure.core/frequencies, returns a map of inputs to the number of
    times those inputs appeared in the collection.
    
        (t/tesser [[1 2 3] [1 1 1]] (t/frequencies))
        ; => {1 4, 2 1, 3 1}
    (fuse fold-map)(fuse fold-map fold__17031__auto__)
    You've got several folds, and want to execute them in one pass. Fuse is the
    function for you! It takes a map from keys to folds, like
    
        (->> (map parse-person)
             (fuse {:age-range    (->> (map :age) (range))
                    :colors-prefs (->> (map :favorite-color) (frequencies))})
             (tesser people))
    
    And returns a map from those same keys to the results of the corresponding
    folds:
    
        {:age-range   [0 74],
         :color-prefs {:red        120
                       :blue       312
                       :watermelon 1953
                       :imhotep    1}}
    
    Note that this fold only invokes parse-person once for each record, and
    completes in a single pass. If we ran the age and color folds independently,
    it'd take two passes over the dataset--and require parsing every person
    *twice*.
    
    Fuse and facet both return maps, but generalize over different axes. Fuse
    applies a fixed set of *independent* folds over the *same* inputs, where
    facet applies the *same* fold to a dynamic set of keys taken from the
    inputs.
    
    Note that fuse compiles the folds you pass to it, so you need to build them
    completely *before* fusing. The fold fuse returns can happily be combined
    with other transformations at its level, but its internal folds are sealed
    and opaque.
    (group-by category-fn)(group-by category-fn fold__17031__auto__)
    Every input belongs to exactly one category, and you'd like to apply a fold
    to each category separately.
    
    Group-by takes a function that returns a category for every element, and
    returns a map of those categories to the results of the downstream fold
    applied to the inputs in that category.
    
    For instance, say we have a collection of particles of various types, and
    want to find the highest mass of each particle type:
    
        (->> (t/group-by :type)
             (t/map :mass)
             (t/max)
             (t/tesser [[{:name :electron, :type :lepton, :mass 0.51}
                         {:name :muon,     :type :lepton, :mass 105.65}
                         {:name :up,       :type :quark,  :mass 1.5}
                         {:name :down,     :type :quark,  :mass 3.5}]]))
        ; => {:lepton 105.65, :quark 3.5}
    (into coll)(into coll fold__17031__auto__)
    Adds all inputs to the given collection using conj. Ordering of elements
    from distinct chunks is undefined.
    
        (t/tesser [[1 2 3] [4 5 6] [7 8 9]] (t/into []))
        ; => [7 8 9 1 2 3 4 5 6]
    (keep f)(keep f fold__17031__auto__)
    Takes a function f and an optional fold. Returns a version of the fold
    which finally calls (f input) to transform each element, and passes it on to
    subsequent transforms only when the result of (f input) is truthy.
    
        (->> (t/keep {:a 1 :b 2})
             (t/into [])
             (t/tesser [[:a :b] [:c :d]]))
        ; => [1 2]
    (map f)(map f fold__17031__auto__)
    Takes a function f and an optional fold. Returns a version of the fold
    which finally calls (f input) to transform each element.
    
        (->> (t/map inc) (t/into []) (t/tesser [[1 2] [3 4]]))
        ; => [2 3 4 5]
    (mapcat f)(mapcat f fold__17031__auto__)
    Takes a function f and an optional fold. Returns a version of the fold
    which finally calls (f input) to transform each element. (f input) should
    return a *sequence* of inputs which will be fed to the downstream transform
    independently.
    
        (->> (t/mapcat seq) ; explode strings into seqs of chars
             (t/set)
             (t/tesser [["meow" "mix"]]))
        ; => #{\e \i \m \o \w \x}
    (max & [f])
    Finds the largest value using compare.
    For example:
    
        (t/tesser [[:a :b] [:c :d]] (t/max))
        ; => :d
    (min & [f])
    Finds the smallest value using compare.
    For example:
    
        (t/tesser [[:a :b] [:c :d]] (t/min))
        ; => :a
    (not-every? pred & [f])
    True if there exists an input which does *not* satisfy the given predicate.
    For instance:
    
        (t/tesser [[1 3 5] [6]] (t/not-every? odd?))
        ; => true
    (post-combine f)(post-combine f fold__17031__auto__)
    Transforms the output of a fold by applying a function to it.
    
    For instance, to find the square root of the mean of a sequence of numbers,
    try
    
        (->> (t/mean) (t/post-combine sqrt) (t/tesser nums))
    
    For clarity in ->> composition, post-combine composes in the opposite
    direction from map, filter, etc. It *prepends* a transform to the given fold
    instead of *appending* one. This means post-combines take effect in the same
    order you'd expect from ->> with normal function calls:
    
        (->> (t/mean)                 (->> (mean nums)
             (t/post-combine sqrt)         (sqrt)
             (t/post-combine inc))         (inc))
    (range & [f])
    Returns a pair of `[smallest largest]` inputs, using compare.
    For example:
    
        (t/tesser [[4 5 6] [1 2 3]] (t/range))
        ; => [1 6]
    (reduce f init)(reduce f init fold__17031__auto__)
    A fold that uses the same function for the reduce and combine phases. Unlike
    normal Clojure reduce, this reduce doesn't take a collection: it just returns
    a fold which can be applied to a collection via tesser. Why? You might want
    to compose the reduction with something else using fuse, map it with
    post-combine, etc etc.
    
    Follows the clojure reducers and transducers conventions for arities:
    
    - `(constantly init)` is used to generate identity elements.
    - `(f acc input)` folds elements in the reduce and combine phases.
    - `(f acc)` post-reduces and post-combines, unless `(f acc)` throws
      clojure.lang.ArityException, in which case we return acc directly.
    
    This means you can use probably `(reduce f init)` as a phase anywhere f is
    associative and commutative, and where init is immutable.
    
        (->> (t/map inc)
             (t/reduce + 0)
             (t/tesser [[1 2 3] [4 5 6]]))
        ; => 27
    
    Due to technical limitations Tesser can't distinguish between
    
        (reduce + upstream-fold)
    
    where we're transforming an uncompiled fold by adding a reduce phase, and
    
        (reduce + 0)
    
    where we're defining a new phase out of thin air with 0 as the initial value.
    Consequently, we *always* interpret the second argument as an initial value.
    We *don't* provide an equivalent for `(reduce +)` yet. Someday. Use `(fold +
    +)` or `(reduce + (+))` instead.
    (remove pred)(remove pred fold__17031__auto__)
    Takes a predicate function pred and an optional fold. Returns a version of
    the fold which only passes on inputs to subsequent transforms when (pred
    input) is nil or false.
    
        (->> (t/remove odd?)
             (t/into [])
             (t/tesser [[1 2 3 4 5 6]]))
         ; => [2 4 6]
    (replace m & [f])
    Given a map of replacement pairs, maps any inputs which are keys in the map
    to their corresponding values. Leaves unrecognized inputs alone.
    
        (->> (t/replace {:x false})
             (t/into [])
             (t/tesser [[:x :y]]))
        ; => [false :y]
    (set)(set fold__17031__auto__)
    A hash-set of distinct inputs.
    For instance:
    
        (->> (t/map inc)
             (t/set)
             (t/tesser [[1 2 3] [4 5 6]]))
        ; => #{7 4 6 3 2 5}
    (some pred)(some pred fold__17031__auto__)
    Returns the first logical true value of (pred input). If no such satisfying
    input exists, returns nil.
    
    This is potentially *less* efficient than clojure.core/some because each
    reducer has to find a matching element independently, and they have no way to
    communicate when one has found an element. In the worst-case scenario,
    requires N calls to pred. However, unlike clojure.core/some, this version
    is parallelizable--which can make it more efficient when the element is rare.
    
        (t/tesser [[1 2 3] [4 5 6]] (t/some #{1 2}))
        ; => 1
    (take n)(take n fold__17031__auto__)
    Like clojure.core/take, limits the number of inputs passed to the downstream
    transformer to exactly n, or if fewer than n inputs exist in total, all
    inputs.
    
        (->> (t/map inc)
             (t/take 5)
             (t/into [])
             (t/tesser [[1 2 3] [4 5 6] [7 8 9]]))
        ; => [6 7 5 3 4]
    
    Space complexity note: take's reducers produce log2(chunk-size) reduced
    values per chunk, ranging from 1 to chunk-size/2 inputs, rather than a single
    reduced value for each chunk. See the source for *why* this is the case.
    (tesser chunks fold)
    Compiles a fold and applies it to a sequence of sequences of inputs. Runs
    num-procs threads for the parallel (reducer) portion of the fold. Reducers
    take turns combining their results, which prevents unbounded memory
    consumption by the reduce phase.
    
        (t/tesser [["hi"] ["there"]] (t/fold str))
        ; => "therehi"
    (transform f)(transform f fold__17031__auto__)
    An arbitrary transform. Takes a function that maps one compiled fold map to
    another, and an optional uncompiled fold, and returns an uncompiled fold with
    that transformation applied innermost; e.g. it controls inputs *last*, and
    post-processes outputs *first*.
    (wrap-transform f)(wrap-transform f fold__17031__auto__)
    An arbitrary transform. Takes a function that maps one compiled fold map to
    another, and an optional uncompiled fold, and returns an uncompiled fold with
    that transformation applied outermost; e.g. it controls inputs *first*, and
    post-processes outputs *last*.