CrossClj

0.8.2 docs

SourceDocs



RECENT
    VARS
    aggregate
    cache
    cartesian
    close
    coalesce
    collect
    combine-by-key
    count
    count-by-key
    count-by-value
    defsparkfn
    distinct
    double-untuple
    filter
    first
    flat-map
    flat-map-to-pair
    flat-map-values
    fn
    fold
    foreach
    foreach-partition
    foreach-rdd
    ftruthy?
    glom
    group-by
    group-by-key
    group-untuple
    hash-partitioner
    histogram
    iterator
    iterator-fn
    jar-of-ns
    join
    left-outer-join
    local-spark-context
    map
    map-partitions
    map-partitions-to-pair
    map-partitions-with-index
    map-to-pair
    map-values
    max
    min
    parallelize
    parallelize-pairs
    partition-by
    partition-count
    partitioner
    partitions
    partitionwise-sampled-rdd
    persist
    rdd-name
    reduce
    reduce-by-key
    repartition
    sample
    save-as-sequence-file
    save-as-text-file
    sort-by-key
    spark-context
    STORAGE-LEVELS
    subtract
    take
    take-ordered
    text-file
    union
    unpersist
    untuple
    values
    whole-text-files
    with-context

    « Index of all namespaces of this project

    (aggregate rdd zero-value seq-op comb-op)
    Aggregates the elements of each partition, and then the results for all the partitions,
    using a given combine function and a neutral 'zero value'.
    Persists rdd with the default storage level (MEMORY_ONLY).
    
    (cartesian rdd1 rdd2)
    Creates the cartesian product of two RDDs returning an RDD of pairs
    
    (coalesce rdd n)(coalesce rdd n shuffle?)
    Decrease the number of partitions in rdd to n.
    Useful for running operations more efficiently after filtering down a large dataset.
    Returns all the elements of rdd as an array at the driver process.
    
    (combine-by-key rdd create-combiner merge-value merge-combiners & {:keys [n]})
    Combines the elements for each key using a custom set of aggregation functions.
    Turns an RDD of (K, V) pairs into a result of type (K, C), for a 'combined type' C.
    Note that V and C can be different -- for example, one might group an RDD of type
    (Int, Int) into an RDD of type (Int, List[Int]).
    Users must provide three functions:
    -- createCombiner, which turns a V into a C (e.g., creates a one-element list)
    -- mergeValue, to merge a V into a C (e.g., adds it to the end of a list)
    -- mergeCombiners, to combine two C's into a single one.
    Return the number of elements in rdd.
    
    (count-by-key rdd)
    Only available on RDDs of type (K, V).
    Returns a map of (K, Int) pairs with the count of each key.
    (count-by-value rdd)
    Return the count of each unique value in rdd as a map of (value, count)
    pairs.
    macro
    (defsparkfn name & body)
    (distinct rdd)(distinct rdd n)
    Return a new RDD that contains the distinct elements of the source rdd.
    
    (double-untuple t)
    (filter rdd f)
    Returns a new RDD containing only the elements of rdd that satisfy a predicate f.
    
    Returns the first element of rdd.
    
    (flat-map rdd f)
    Similar to map, but each input item can be mapped to 0 or more output items (so the
    function f should return a collection rather than a single item)
    (flat-map-to-pair rdd f)
    Returns a new JavaPairRDD by first applying f to all elements of rdd, and then flattening
    the results.
    (flat-map-values rdd f)
    Apply function f to the values of JavaPairRDD rdd returning at iterator of new values.
    
    macro
    (fn & body)
    (fold rdd zero-value f)
    Aggregates the elements of each partition, and then the results for all the partitions,
    using a given associative function and a neutral 'zero value'
    (foreach rdd f)
    Applies the function f to all elements of rdd.
    
    (foreach-partition rdd f)
    Applies the function f to each partition iterator of rdd.
    
    (foreach-rdd dstream f)
    Applies the function f to each rdd in dstream
    
    Private
    (ftruthy? f)
    Returns an RDD created by coalescing all elements of rdd within each partition into a list.
    
    (group-by rdd f)(group-by rdd f n)
    Returns an RDD of items grouped by the return value of function f.
    
    (group-by-key rdd & {:keys [n]})
    Groups the values for each key in rdd into a single sequence.
    
    (group-untuple t)
    (hash-partitioner n)(hash-partitioner subkey-fn n)
    compute histogram of an RDD of doubles
    
    macro
    (iterator-fn & body)
    (jar-of-ns ns)
    (join rdd other)
    When called on rdd of type (K, V) and (K, W), returns a dataset of
    (K, (V, W)) pairs with all pairs of elements for each key.
    (left-outer-join rdd other)
    Performs a left outer join of rdd and other. For each element (K, V)
     in the RDD, the resulting RDD will either contain all pairs (K, (V, W)) for W in other,
    or the pair (K, (V, nil)) if no elements in other have key K.
    (local-spark-context app-name)
    (map rdd f)
    Returns a new RDD formed by passing each element of the source through the function f.
    
    (map-partitions rdd f)
    Similar to map, but runs separately on each partition (block) of the rdd, so function f
    must be of type Iterator<T> => Iterable<U>.
    https://issues.apache.org/jira/browse/SPARK-3369
    (map-partitions-to-pair rdd f & {:keys [preserves-partitioning], :or {preserves-partitioning false}})
    Similar to map-partitions, but runs separately on each partition (block) of the rdd, so function f
    must be of type Iterator<T> => Iterable<scala.Tuple2<K,V>>.
    (map-partitions-with-index rdd f)
    Similar to map-partitions but function f is of type (Int, Iterator<T>) => Iterator<U> where
    i represents the index of partition.
    (map-to-pair rdd f)
    Returns a new JavaPairRDD of (K, V) pairs by applying f to all elements of rdd.
    
    (map-values rdd f)
    Apply function f over the values of JavaPairRDD rdd returning a new JavaPairRDD.
    
    (max rdd)(max rdd compare-fn)
    Return the maximum value in rdd using a comparator.
    
    (min rdd)(min rdd compare-fn)
    Return the minimum value rdd using a comparator.
    
    (parallelize spark-context lst)(parallelize spark-context lst num-slices)
    Distributes a local collection to form/return an RDD
    
    (parallelize-pairs spark-context lst)(parallelize-pairs spark-context lst num-slices)
    Distributes a local collection to form/return a Pair RDD
    
    (partition-by rdd partitioner)
    Partition rdd by partitioner.
    
    (partition-count rdd)
    Returns the number of partitions for a given rdd.
    
    (partitioner rdd)
    (partitions rdd)
    Returns a vector of partitions for a given rdd.
    
    (partitionwise-sampled-rdd rdd sampler preserve-partitioning? seed)
    Creates a PartitionwiseSampledRRD from existing RDD and a sampler object
    
    (persist rdd storage-level)
    Sets the storage level of rdd to persist its values across operations
    after the first time it is computed. storage levels are available in the `STORAGE-LEVELS' map.
    This can only be used to assign a new storage level if the RDD does not have a storage level set already.
    (rdd-name rdd name)(rdd-name rdd)
    (reduce rdd f)
    Aggregates the elements of rdd using the function f (which takes two arguments
    and returns one). The function should be commutative and associative so that it can be
    computed correctly in parallel.
    (reduce-by-key rdd f)
    When called on an rdd of (K, V) pairs, returns an RDD of (K, V) pairs
    where the values for each key are aggregated using the given reduce function f.
    (repartition rdd n)
    Returns a new rdd with exactly n partitions.
    
    (sample rdd with-replacement? fraction seed)
    Returns a fraction sample of rdd, with or without replacement,
    using a given random number generator seed.
    (save-as-sequence-file rdd path)
    Writes the elements of rdd as a Hadoop SequenceFile in a given path
    in the local filesystem, HDFS or any other Hadoop-supported file system.
    This is available on RDDs of key-value pairs that either implement Hadoop's
    Writable interface.
    (save-as-text-file rdd path)(save-as-text-file rrd path compression-codec)
    Writes the elements of rdd as a text file (or set of text files)
    in a given directory path in the local filesystem, HDFS or any other Hadoop-supported
    file system. Spark will call toString on each element to convert it to a line of
    text in the file.
    (sort-by-key rdd)(sort-by-key rdd x)(sort-by-key rdd compare-fn asc?)
    When called on rdd of (K, V) pairs where K implements ordered, returns a dataset of
     (K, V) pairs sorted by keys in ascending or descending order, as specified by the boolean
    ascending argument.
    (spark-context conf)(spark-context master app-name)
    Creates a spark context that loads settings from given configuration object
    or system properties
    (subtract rdd other)
    returns an rdd with elements from other removed.
    
    (take rdd cnt)
    Return an array with the first n elements of rdd.
    (Note: this is currently not executed in parallel. Instead, the driver
    program computes all the elements).
    (take-ordered rdd cnt)(take-ordered rdd cnt compare-fn)
    Return an array with the first n elements of rdd.
    (Note: this is currently not executed in parallel. Instead, the driver
    program computes all the elements).
    (text-file spark-context filename)(text-file spark-context filename min-partitions)
    Reads a text file from HDFS, a local file system (available on all nodes),
    or any Hadoop-supported file system URI, and returns it as an JavaRDD of Strings.
    (union rdd other)(union context rdd & rdds)
    Union rdd and other, or multiple RDDs. Duplicate keys are kept.
    
    (unpersist rdd)
    (values rdd)
    Get the values from a Pair RDD.
    
    (whole-text-files spark-context path)(whole-text-files spark-context path min-partitions)
    Read a directory of text files from HDFS, a local file system (available on all nodes),
    or any Hadoop-supported file system URI, and returns it as a JavaPairRDD of (K, V) pairs,
    where, K is the path of each file, V is the content of each file.
    macro
    (with-context context-sym conf & body)