CrossClj

2.1.3 docs

SourceDocs



RECENT
    VARS
    aggregate
    cache
    cartesian
    checkpoint
    coalesce
    coalesce-max
    cogroup
    collect
    collect-map
    combine-by-key
    count
    count-by-key
    count-by-value
    count-partitions
    distinct
    filter
    first
    flat-map
    flat-map-to-pair
    flat-map-values
    fold
    foreach
    foreach-partition
    glom
    group-by
    group-by-key
    hash-partitioner
    histogram
    intersection
    join
    key-by
    keys
    left-outer-join
    local-spark-context
    lookup
    map
    map-partition
    map-partition-with-index
    map-partitions-to-pair
    map-to-pair
    map-values
    max
    min
    parallelize
    parallelize-pairs
    partition-by
    partitioner
    partitioner-aware-union
    partitions
    partitionwise-sampled-rdd
    persist
    rdd-name
    reduce
    reduce-by-key
    rekey-preserving-partitioning-without-check
    repartition
    sample
    save-as-text-file
    sort-by-key
    spark-context
    STORAGE-LEVELS
    subtract
    subtract-by-key
    take
    text-file
    tuple
    uncache
    union
    values
    with-context
    zip-with-index
    zip-with-unique-id

    « Index of all namespaces of this project

    (aggregate rdd zero-value seq-op comb-op)
    Aggregates the elements of each partition, and then the results for all the partitions,
    using a given combine function and a neutral 'zero value'.
    Persists rdd with the default storage level (MEMORY_ONLY).
    
    (cartesian rdd1 rdd2)
    Creates the cartesian product of two RDDs returning an RDD of pairs
    
    (coalesce rdd n)(coalesce rdd n shuffle?)
    Decrease the number of partitions in rdd to n.
    Useful for running operations more efficiently after filtering down a large dataset.
    (coalesce-max rdd n)(coalesce-max rdd n shuffle?)
    Decrease the number of partitions in rdd to n.
    Useful for running operations more efficiently after filtering down a large dataset.
    Returns all the elements of rdd as an array at the driver process.
    
    Retuns all elements of pair-rdd as a map at the driver process.
    Attention: The resulting map will only have one entry per key.
               Thus, if you have multiple tuples with the same key in the pair-rdd, the collection returned will not contain all elements!
               The function itself will *not* issue a warning of any kind!
    (combine-by-key rdd create-combiner merge-value merge-combiners)(combine-by-key rdd create-combiner merge-value merge-combiners n)
    Combines the elements for each key using a custom set of aggregation functions.
    Turns an RDD of (K, V) pairs into a result of type (K, C), for a 'combined type' C.
    Note that V and C can be different -- for example, one might group an RDD of type
    (Int, Int) into an RDD of type (Int, List[Int]).
    Users must provide three functions:
    -- createCombiner, which turns a V into a C (e.g., creates a one-element list)
    -- mergeValue, to merge a V into a C (e.g., adds it to the end of a list)
    -- mergeCombiners, to combine two C's into a single one.
    (distinct rdd)(distinct rdd n)
    Return a new RDD that contains the distinct elements of the source rdd.
    
    (filter rdd f)
    Returns a new RDD containing only the elements of rdd that satisfy a predicate f.
    
    (flat-map rdd f)
    Similar to map, but each input item can be mapped to 0 or more output items (so the
    function f should return a collection rather than a single item)
    (flat-map-to-pair rdd f)
    Returns a new JavaPairRDD by first applying f to all elements of rdd, and then flattening
    the results.
    (flat-map-values rdd f)
    (fold rdd zero-value f)
    Aggregates the elements of each partition, and then the results for all the partitions,
    using a given associative function and a neutral 'zero value'
    (foreach rdd f)
    Applies the function f to all elements of rdd.
    
    (foreach-partition rdd f)
    Applies the function f to all elements of rdd.
    
    Returns an RDD created by coalescing all elements of rdd within each partition into a list.
    
    (group-by rdd f)(group-by rdd f n)
    Returns an RDD of items grouped by the return value of function f.
    
    (group-by-key rdd)(group-by-key rdd n)
    Groups the values for each key in rdd into a single sequence.
    
    (hash-partitioner n)(hash-partitioner subkey-fn n)
    (histogram rdd buckets)
    (intersection rdd1 rdd2)
    (join rdd other)
    When called on rdd of type (K, V) and (K, W), returns a dataset of
    (K, (V, W)) pairs with all pairs of elements for each key.
    (key-by rdd f)
    Creates tuples of the elements in this RDD by applying f.
    
    (left-outer-join rdd other)
    Performs a left outer join of rdd and other. For each element (K, V)
    in the RDD, the resulting RDD will either contain all pairs (K, (V, W)) for W in other,
    or the pair (K, (V, nil)) if no elements in other have key K.
    Return the vector of values in the RDD for key key. Your key has to be serializable with the Java serializer (not Kryo like usual) to use this.
    
    (map rdd f)
    Returns a new RDD formed by passing each element of the source through the function f.
    
    (map-partition rdd f)
    Similar to map, but runs separately on each partition (block) of the rdd, so function f
    must be of type Iterator<T> => Iterable<U>.
    https://issues.apache.org/jira/browse/SPARK-3369
    (map-partition-with-index rdd f)
    Similar to map-partition but function f is of type (Int, Iterator<T>) => Iterator<U> where
    i represents the index of partition.
    (map-partitions-to-pair rdd f & {:keys [preserves-partitioning]})
    Similar to map, but runs separately on each partition (block) of the rdd, so function f
    must be of type Iterator<T> => Iterable<U>.
    https://issues.apache.org/jira/browse/SPARK-3369
    (map-to-pair rdd f)
    Returns a new JavaPairRDD of (K, V) pairs by applying f to all elements of rdd.
    
    (map-values rdd f)
    (parallelize spark-context lst)(parallelize spark-context lst num-slices)
    Distributes a local collection to form/return an RDD
    
    (parallelize-pairs spark-context lst)(parallelize-pairs spark-context lst num-slices)
    Distributes a local collection to form/return an RDD
    
    (partition-by rdd partitioner)
    (partitionwise-sampled-rdd rdd sampler preserve-partitioning? seed)
    Creates a PartitionwiseSampledRRD from existing RDD and a sampler object
    
    (persist rdd storage-level)
    Sets the storage level of rdd to persist its values across operations
    after the first time it is computed. storage levels are available in the `STORAGE-LEVELS' map.
    This can only be used to assign a new storage level if the RDD does not have a storage level set already.
    (rdd-name rdd name)(rdd-name rdd)
    (reduce rdd f)
    Aggregates the elements of rdd using the function f (which takes two arguments
    and returns one). The function should be commutative and associative so that it can be
    computed correctly in parallel.
    (reduce-by-key rdd f)
    When called on an rdd of (K, V) pairs, returns an RDD of (K, V) pairs
    where the values for each key are aggregated using the given reduce function f.
    (rekey-preserving-partitioning-without-check rdd rekey-fn)
    This re-keys a pair-rdd by applying the rekey-fn to generate new tuples. However, it does not check whether your new keys would keep the same partitioning, so watch out!!!!
    
    (repartition rdd n)
    Returns a new rdd with exactly n partitions.
    
    (sample rdd with-replacement? fraction seed)
    Returns a fraction sample of rdd, with or without replacement,
    using a given random number generator seed.
    (save-as-text-file rdd path)(save-as-text-file rdd path codec-class)
    Writes the elements of rdd as a text file (or set of text files)
    in a given directory path in the local filesystem, HDFS or any other Hadoop-supported
    file system. Supports an optional codec class like org.apache.hadoop.io.compress.GzipCodec.
    Spark will call toString on each element to convert it to a line of
    text in the file.
    (sort-by-key rdd)(sort-by-key rdd x)(sort-by-key rdd compare-fn asc?)
    When called on rdd of (K, V) pairs where K implements ordered, returns a dataset of
    (K, V) pairs sorted by keys in ascending or descending order, as specified by the boolean
    ascending argument.
    (subtract rdd1 rdd2)
    Removes all elements from rdd1 that are present in rdd2.
    
    (subtract-by-key rdd1 rdd2)
    Return each (key, value) pair in rdd1 that has no pair with matching key in rdd2.
    
    (take rdd cnt)
    Return an array with the first n elements of rdd.
    (Note: this is currently not executed in parallel. Instead, the driver
    program computes all the elements).
    Marks rdd as non-persistent (removes all blocks for it from memory and disk). If blocking? is true, block until the operation is complete.
    
    (values rdd)
    Returns the values of a JavaPairRDD
    
    macro
    (with-context context-sym conf & body)