1. Overview

This is a sample service for managing assets.

1.1. Version information

Version: 0.1.0-SNAPSHOT

1.2. Contact information

Contact: Aaron Donovan

1.3. License information

License: Apache 2.0 License URL: http://www.apache.org/licenses/LICENSE-2.0.html

1.4. URI scheme

BasePath: /

1.5. Tags

  • asset

2. Background

The asset Web services consist of RESTful services fronting a distributed database. Copies of the service would be deployed to all data centers. The services are stateless and should be scaled horizontally behind load balancers. Each data center would maintain a full copy of all data, preferably three copies as availability is a stated priority. Communication between the services and data nodes is configurable based upon consistency requirements. However, in the most likely configuration, the asset Web services would read and write only to data nodes in the local data center. Data nodes would be responsible for replicating writes across data centers.

2.1. Database Recommendation

Requirements prioritize availability and partition tolerance above consistency and explicitly state that eventual consistency is acceptable/expected. Cassandra is well-suited to these requirements. Many distributed databases use master/slave replication which have worse write performance and more complex failure characteristics. Cassandra is data center aware and provides a replication approach that has been validated by numerous internet scale companies such as Netflix and Facebook. I have never used Cassandra before, but preliminary research pointed me in that direction. I then read Cassandra High Availability and became more confident in the choice. The limited prototype works well.

2.1.1. Resolving Data Inconsistencies

Cassandra is capable of recovering automatically from temporary network failures. All database operations include a timestamp. When resolving conflicts, the most recent timestamp wins. In the extremely unlikely case that two conflicting operations occurred at the exact same microsecond, Cassandra uses the byte value of the data to consistently resolve the conflict. There are two important time windows for recovering from node/network failures in Cassandra. The time that “hinted handoffs” are stored and the time tombstones, markers for deletes are stored. Additional recovery steps are required if these windows are exceeded.

2.1.2. Database Schema

Unlike most relational databases, Cassandra supports storage of multiple values in a column. This approach is used to store notes. It assists with the requirements that notes are deleted when an asset is deleted and that notes cannot be added to deleted assets. In addition, joins are not required to retrieve an assets notes.

CREATE TABLE assets (
  uri text PRIMARY KEY, (1)
  name text, (2)
  modtime timestamp, (3)
  notes list<text> (4)
);
  1. Use the asset’s uri to uniquely identify it.

  2. Holds the human readable name as required.

  3. A timestamp is added and updated as required to support caching.

  4. Notes are stored in the same table as a collection.

2.2. Service Implementation

The asset services are written in Java using the Dropwizard framework. I planned to use Go as there seemed to be some interest in me showcasing go-restful and its Swagger support. However, the go CQL client is under heavy development and isn’t well documented or feature complete. Dropwizard encourages production of a standalone jar file rather than a war deployed on a Java application server. This approach is consistent with a Microservices architecture. Development and testing are accelerated. Also, it’s easy to run multiple copies of the application in Docker containers. I’ve included a Dockerfile in the build if you’d like to test it that way. Additionally, Dropwizard provides support for production deployment with integrated metrics and health checks. I’d looked last year when reading Building Microservices and wanted to try it out.

2.3. Building

Dropwizard encourages Maven, but I’ve been using Gradle for a couple of years and couldn’t make myself go back. If you have a version 8 JDK installed and it’s on your path or JAVA_HOME is properly set, just run the included Gradle wrapper file. It does require an internet connection to download Gradle and project dependencies. The result is an executable jar with all dependencies bundled in.

./gradlew build

Automated builds run on travis-ci with each push. In addition, there is a automated Docker build published to docker hub. To build it locally run the following.

docker build -t amdonov/asset-service:latest .

2.4. Running

There are two data store implementations provided, one for Cassandra and a memory-based store.

2.4.1. Memory Store

No preparation is required to run the services with with a memory store. Use the provided configuration file, sample-config.yaml

java -jar build/libs/asset-service-0.1.0-SNAPSHOT-all.jar server sample-config.yaml

2.4.2. Cassandra Store

To run the Cassandra store, a running Cassandra cluster is required. One can easily be started with ccm. It was tested with the following.

ccm create test -v 2.2.4 -n 3 -s

There is a sample cassandra configuration provided, cassandra-config.yaml. Start the Web services with the following.

java -jar build/libs/asset-service-0.1.0-SNAPSHOT-all.jar server cassandra-config.yaml

2.4.3. Services

If you use the provided configuration files, the services will be running on port 8080. The easiest way to test the services is with included Swagger UI. It’s available at http://localhost:8080/swagger.

2.4.4. Admin Interface

There are some additional features included for future production deployment. They are accessable on port 8081 by default at http:localhost:8081/ They included health checks, metrics, and thread monitoring.

2.4.5. Docker

If desirable, the services can run in docker.

docker run -it -p 8080:8080 -p 8081:8081 amdonov/asset-service server sample-config.yaml

2.5. Metrics

All service calls leverage Dropwizard’s metrics collection. Metrics are available on the admin port, 8081 by default, at the path of /metrics?pretty=true. I didn’t perform any real load testing. I don’t have enough information about load profile and testing on laptop vice geographically distributed datacenters isn’t representative. However, in support of paging development, I created 10,000 assets with curl, so assets were created one at a time without concurrent reads. The results are posted below. For comparison, I did the same thing with the memory store.

Cassandra

 "com.mycompany.assets.AssetResource.createAsset" : {
       "count" : 10000,
       "max" : 0.017478096000000002,
       "mean" : 0.002814376620566178,
       "min" : 0.001953089,
       "p50" : 0.002424475,
       "p75" : 0.0028103620000000003,
       "p95" : 0.004579411,
       "p98" : 0.007236607000000001,
       "p99" : 0.008595328000000001,
       "p999" : 0.01311428,
       "stddev" : 0.0012552083933711567,
       "m15_rate" : 10.092396615694161,
       "m1_rate" : 46.50558186597699,
       "m5_rate" : 25.105013904684526,
       "mean_rate" : 56.92798508217708, (1)
       "duration_units" : "seconds",
       "rate_units" : "calls/second"
     }
  1. Averages 57 inserts/second with service database running locally. Data is written 3 times.

Memory

"com.mycompany.assets.AssetResource.createAsset" : {
      "count" : 10000,
      "max" : 0.001961846,
      "mean" : 1.1835166136245698E-4,
      "min" : 7.8756E-5,
      "p50" : 9.636100000000001E-5,
      "p75" : 1.1936600000000001E-4,
      "p95" : 2.30549E-4,
      "p98" : 2.98819E-4,
      "p99" : 4.07568E-4,
      "p999" : 0.001961846,
      "stddev" : 9.236737647115021E-5,
      "m15_rate" : 10.293621034472327,
      "m1_rate" : 58.078389437591916,
      "m5_rate" : 26.57566846742506,
      "mean_rate" : 76.92383281630354, (1)
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    }
  1. On 77 calls/second for the memory store. The CPU was fairly idle. Need a multi-threaded client to excercise the service.

3. Resources

3.1. Asset

3.1.1. Retrieve an asset

GET /asset
Parameters
Type Name Description Required Schema Default

QueryParameter

uri

uri of asset to retrieve

true

string

Responses
HTTP Code Description Schema

200

success

Asset

304

asset not modified

No Content

400

invalid request

No Content

404

asset not found

No Content

500

unexpected error

No Content

Consumes
  • application/json

Produces
  • application/json

3.1.2. Delete an asset

DELETE /asset
Parameters
Type Name Description Required Schema Default

QueryParameter

uri

uri of asset to delete

true

string

Responses
HTTP Code Description Schema

204

asset deleted

No Content

400

invalid request

No Content

404

asset not found

No Content

500

unexpected error

No Content

Consumes
  • application/json

3.1.3. Create a new asset

POST /asset
Parameters
Type Name Description Required Schema Default

BodyParameter

body

asset to create

false

Asset

Responses
HTTP Code Description Schema

201

asset created

No Content

400

asset is invalid such as missing or invalid name or uri

No Content

409

asset exists with that uri

No Content

500

unexpected error

No Content

Consumes
  • application/json

Produces
  • application/json

3.1.4. Create a new note on an asset

POST /asset/note
Parameters
Type Name Description Required Schema Default

BodyParameter

body

note to create

false

Note

Responses
HTTP Code Description Schema

201

note created

No Content

400

note is invalid

No Content

404

asset not found

No Content

500

unexpected error

No Content

Consumes
  • application/json

3.1.5. Retrieve assets based on criteria

GET /asset/search
Parameters
Type Name Description Required Schema Default

QueryParameter

page

result page to display

false

string

Responses
HTTP Code Description Schema

200

success

SearchResult

400

invalid request such as a bad page

No Content

500

unexpected error

No Content

Consumes
  • application/json

Produces
  • application/json

4. Definitions

4.1. Asset

Name Description Required Schema Default

uri

identifier

true

string

name

human readable label

true

string

notes

notes

false

string array

4.2. AssetSummary

Name Description Required Schema Default

uri

identifier

true

string

name

human readable label

true

string

4.3. Note

Name Description Required Schema Default

uri

asset identifier

true

string

note

note contents

true

string

4.4. SearchResult

Name Description Required Schema Default

assets

false

AssetSummary array

nextPage

use this value as page argument to retrieve the next page of results

false

string