Index of all namespaces
Reusable storm bolt for storing data to S3
Reusable Storm bolt for archiving data to file. Currently supports storing to s3.
- Writing objects to S3 no longer uses a temporary file.
- Bumped storm version to 0.9.3.
archivebolt now takes a
"meta"field to the input tuple which will be passed through to the output tuple
- Tests now require environment variables. See the “Running Tests” section.
archive-read-filteredbolt which must be initialized with a predicate function for filtering search results. The predicate function must be quoted with a backtick.
log-warnif no results are found during archive read
Takes a tuple of
["meta" "backend", "location", "content"] where backend is where the content will be stored, location is the path the file should be saved to, and content is a string to be written to file (json).
Emits a bolt of
- Attempts to safely retry storing to the specified backend so the caller does not have to
- Emits a boolean if it was successful
acks!the tuple if successful,
fail!the tuple if unsuccessful
Takes a tuple of
["meta", "backend", "location"] where backend is a string of the backend the content is stored in and location is the path the file should be read from.
For s3, if there are more than 1,000 results, it will automatically paginate to yield all results.
- Emits a tuple of
- Results is a collection representing the value of all keys at the given location and meta data. Each item in the collection has keys
:valueis the string from the document corresponding to the location.
acks!the tuple whether there were results or not
- Only emits if there are results returned
The library used to interact with s3,
amazonica, uses a java library that requires
org.apache.httpcomponents/httpclient version 4.2.5+. Storm 0.9.0.1 (this is updated in later versions)distribution package ships with an out of date version of this library and thus must be replaced on all servers in the cluster.
Replacing Apache httpclient
From the server running a storm distribution:
# Go to where the jars live cd storm-0.9.0.1/lib # Get an updated version of httpclient from maven wget http://repo1.maven.org/maven2/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar # Backup the old package cp httpclient-4.1.1.jar httpclient-4.1.1.jar.bak # Delete it rm httpclient-4.1.1.jar
Make sure to restart any storm running processes so the new classpath takes effect.
Integration tests rely on environment variables being available for writing/reading from S3.
AWS_ACCESS_KEY_ID=<aws id> AWS_SECRET_ACCESS_KEY=<aws secret key> AWS_S3_REGION=<s3 region> S3_BUCKET=<bucket name> lein test