1. Overview
This is a sample service for managing assets.
1.3. License information
License: Apache 2.0 License URL: http://www.apache.org/licenses/LICENSE-2.0.html
2. Background
The asset Web services consist of RESTful services fronting a distributed database. Copies of the service would be deployed to all data centers. The services are stateless and should be scaled horizontally behind load balancers. Each data center would maintain a full copy of all data, preferably three copies as availability is a stated priority. Communication between the services and data nodes is configurable based upon consistency requirements. However, in the most likely configuration, the asset Web services would read and write only to data nodes in the local data center. Data nodes would be responsible for replicating writes across data centers.
2.1. Database Recommendation
Requirements prioritize availability and partition tolerance above consistency and explicitly state that eventual consistency is acceptable/expected. Cassandra is well-suited to these requirements. Many distributed databases use master/slave replication which have worse write performance and more complex failure characteristics. Cassandra is data center aware and provides a replication approach that has been validated by numerous internet scale companies such as Netflix and Facebook. I have never used Cassandra before, but preliminary research pointed me in that direction. I then read Cassandra High Availability and became more confident in the choice. The limited prototype works well.
2.1.1. Resolving Data Inconsistencies
Cassandra is capable of recovering automatically from temporary network failures. All database operations include a timestamp. When resolving conflicts, the most recent timestamp wins. In the extremely unlikely case that two conflicting operations occurred at the exact same microsecond, Cassandra uses the byte value of the data to consistently resolve the conflict. There are two important time windows for recovering from node/network failures in Cassandra. The time that “hinted handoffs” are stored and the time tombstones, markers for deletes are stored. Additional recovery steps are required if these windows are exceeded.
2.1.2. Database Schema
Unlike most relational databases, Cassandra supports storage of multiple values in a column. This approach is used to store notes. It assists with the requirements that notes are deleted when an asset is deleted and that notes cannot be added to deleted assets. In addition, joins are not required to retrieve an assets notes.
CREATE TABLE assets ( uri text PRIMARY KEY, (1) name text, (2) modtime timestamp, (3) notes list<text> (4) );
-
Use the asset’s uri to uniquely identify it.
-
Holds the human readable name as required.
-
A timestamp is added and updated as required to support caching.
-
Notes are stored in the same table as a collection.
2.2. Service Implementation
The asset services are written in Java using the Dropwizard framework. I planned to use Go as there seemed to be some interest in me showcasing go-restful and its Swagger support. However, the go CQL client is under heavy development and isn’t well documented or feature complete. Dropwizard encourages production of a standalone jar file rather than a war deployed on a Java application server. This approach is consistent with a Microservices architecture. Development and testing are accelerated. Also, it’s easy to run multiple copies of the application in Docker containers. I’ve included a Dockerfile in the build if you’d like to test it that way. Additionally, Dropwizard provides support for production deployment with integrated metrics and health checks. I’d looked last year when reading Building Microservices and wanted to try it out.
2.3. Building
Dropwizard encourages Maven, but I’ve been using Gradle for a couple of years and couldn’t make myself go back. If you have a version 8 JDK installed and it’s on your path or JAVA_HOME is properly set, just run the included Gradle wrapper file. It does require an internet connection to download Gradle and project dependencies. The result is an executable jar with all dependencies bundled in.
./gradlew build
Automated builds run on travis-ci with each push. In addition, there is a automated Docker build published to docker hub. To build it locally run the following.
docker build -t amdonov/asset-service:latest .
2.4. Running
There are two data store implementations provided, one for Cassandra and a memory-based store.
2.4.1. Memory Store
No preparation is required to run the services with with a memory store. Use the provided configuration file, sample-config.yaml
java -jar build/libs/asset-service-0.1.0-SNAPSHOT-all.jar server sample-config.yaml
2.4.2. Cassandra Store
To run the Cassandra store, a running Cassandra cluster is required. One can easily be started with ccm. It was tested with the following.
ccm create test -v 2.2.4 -n 3 -s
There is a sample cassandra configuration provided, cassandra-config.yaml. Start the Web services with the following.
java -jar build/libs/asset-service-0.1.0-SNAPSHOT-all.jar server cassandra-config.yaml
2.4.3. Services
If you use the provided configuration files, the services will be running on port 8080. The easiest way to test the services is with included Swagger UI. It’s available at http://localhost:8080/swagger.
2.4.4. Admin Interface
There are some additional features included for future production deployment. They are accessable on port 8081 by default at http:localhost:8081/ They included health checks, metrics, and thread monitoring.
2.5. Metrics
All service calls leverage Dropwizard’s metrics collection. Metrics are available on the admin port, 8081 by default, at the path of /metrics?pretty=true. I didn’t perform any real load testing. I don’t have enough information about load profile and testing on laptop vice geographically distributed datacenters isn’t representative. However, in support of paging development, I created 10,000 assets with curl, so assets were created one at a time without concurrent reads. The results are posted below. For comparison, I did the same thing with the memory store.
Cassandra
"com.mycompany.assets.AssetResource.createAsset" : { "count" : 10000, "max" : 0.017478096000000002, "mean" : 0.002814376620566178, "min" : 0.001953089, "p50" : 0.002424475, "p75" : 0.0028103620000000003, "p95" : 0.004579411, "p98" : 0.007236607000000001, "p99" : 0.008595328000000001, "p999" : 0.01311428, "stddev" : 0.0012552083933711567, "m15_rate" : 10.092396615694161, "m1_rate" : 46.50558186597699, "m5_rate" : 25.105013904684526, "mean_rate" : 56.92798508217708, (1) "duration_units" : "seconds", "rate_units" : "calls/second" }
-
Averages 57 inserts/second with service database running locally. Data is written 3 times.
Memory
"com.mycompany.assets.AssetResource.createAsset" : { "count" : 10000, "max" : 0.001961846, "mean" : 1.1835166136245698E-4, "min" : 7.8756E-5, "p50" : 9.636100000000001E-5, "p75" : 1.1936600000000001E-4, "p95" : 2.30549E-4, "p98" : 2.98819E-4, "p99" : 4.07568E-4, "p999" : 0.001961846, "stddev" : 9.236737647115021E-5, "m15_rate" : 10.293621034472327, "m1_rate" : 58.078389437591916, "m5_rate" : 26.57566846742506, "mean_rate" : 76.92383281630354, (1) "duration_units" : "seconds", "rate_units" : "calls/second" }
-
On 77 calls/second for the memory store. The CPU was fairly idle. Need a multi-threaded client to excercise the service.
3. Resources
3.1. Asset
3.1.1. Retrieve an asset
GET /asset
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
QueryParameter |
uri |
uri of asset to retrieve |
true |
string |
3.1.2. Delete an asset
DELETE /asset
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
QueryParameter |
uri |
uri of asset to delete |
true |
string |
3.1.3. Create a new asset
POST /asset
3.1.4. Create a new note on an asset
POST /asset/note
3.1.5. Retrieve assets based on criteria
GET /asset/search
Parameters
Type | Name | Description | Required | Schema | Default |
---|---|---|---|---|---|
QueryParameter |
page |
result page to display |
false |
string |
4. Definitions
4.1. Asset
Name | Description | Required | Schema | Default |
---|---|---|---|---|
uri |
identifier |
true |
string |
|
name |
human readable label |
true |
string |
|
notes |
notes |
false |
string array |
4.2. AssetSummary
Name | Description | Required | Schema | Default |
---|---|---|---|---|
uri |
identifier |
true |
string |
|
name |
human readable label |
true |
string |
4.3. Note
Name | Description | Required | Schema | Default |
---|---|---|---|---|
uri |
asset identifier |
true |
string |
|
note |
note contents |
true |
string |
4.4. SearchResult
Name | Description | Required | Schema | Default |
---|---|---|---|---|
assets |
false |
AssetSummary array |
||
nextPage |
use this value as page argument to retrieve the next page of results |
false |
string |