class staticanalytics. CountWindowGk
Source: analyticsdoc.
Greenwald - Khanna algorithm for quantile estimation on sliding windows. Given a cumulative probability p, the algorithm returns the approximate value of the p-th quantile of all the values in a sliding window.
The algorithm works by keeping a summary of buckets. Each bucket summarizes a range of values. Through the run of the algorithm, new buckets are created and old ones merged. To allow for the computation on a sliding window, each bucket uses an Exponential Histogram structure to remember how many values it summarizes.
It is summarized in: "Online Algorithm for Approximate Quantile Queries on Sliding Windows" http://dl.acm.org/citation.cfm?id=2954329
The error is not bounded by the absolute value of the output, but by the error in rank of the output element. For instance, if we have 100 elements and query the median, we could get the 48-th (p=0.48), 50-th (p=0.5), 51-th (p=0.51), etc. element.
The algorithms error is bounded by two factors. The first is the quantile estimation factor quantileEps
occurs because of the summary structure and defines the maximum size of the buckets. The second type of error countEps
occurs because of the Exponential Histograms inside the buckets.
The worst-case error is bounded by (quantileEps + 2*countEps + O(countEps^2)), although in practice the error is lower.
This version of the algorithm uses a count-based fixed size sliding window.
new CountWindowGk([arg])
Example
// import modules
var qm = require('qminer');
var fs = qm.fs;
var analytics = qm.analytics;
// create the default TDigest object
var gk = new analytics.CountWindowGk({
windowSize: 5,
quantileEps: 0.001,
countEps: 0.0005
});
// create the data used for calculating quantiles
var inputs = [10, 1, 2, 8, 9, 5, 6, 4, 7, 3];
// fit the TDigest model
for (var i = 0; i < inputs.length; i++) {
gk.partialFit(inputs[i]);
}
// make the prediction for the 0.1 quantile
var prediction = gk.predict(0.1);
// save the model
gk.save(fs.openWrite('gk.bin')).close();
// open the gk model under a new variable
var gk2 = new analytics.CountWindowGk(fs.openRead('gk.bin'));
Parameter
Name | Type | Optional | Description |
---|---|---|---|
arg |
(module:analytics~FixedWindowGkParam or module:fs.FIn) |
Yes |
Construction arguments. There are two ways of constructing:
|
Methods
staticpartialFit(val) → module:analytics.CountWindowGk
Appends a new value to the sliding window. If an old value falls outside the sliding window, it is forgotten.
Example
var qm = require('qminer');
var gk = new qm.analytics.CountWindowGk();
gk.partialFit(1.0);
gk.partialFit(2.0);
Parameter
Name | Type | Optional | Description |
---|---|---|---|
val |
number |
|
the value |
- Returns
-
module:analytics.CountWindowGk
reference to self
staticsave(fout) → module:fs.FOut
Saves the objects state into a binary file.
Example
var qm = require('qminer');
var fs = qm.fs;
var gk = new qm.analytics.CountWindowGk();
// save the model
gk.save(fs.openWrite('gk.bin')).close();
// open the model under a new variable
var gk = new analytics.CountWindowGk(fs.openRead('gk.bin'));
Parameter
Name | Type | Optional | Description |
---|---|---|---|
fout |
|
the output stream |
- Returns
-
module:fs.FOut
the output streamfout
getParams() → module:analytics~FixedWindowGkParam
Returns the models' parameters as a JavaScript object (JSON). These parameters are the same as are set through the constructor.
- Returns
-
module:analytics~FixedWindowGkParam
The construction parameters.var analytics = qm.analytics; var gk = new qm.analytics.CountWindowGk({ windowSize: 100 }); // window 100 elements long gk.partialFit(1.0); gk.partialFit(2.0); gk.partialFit(1.0); gk.partialFit(3.0); gk.partialFit(2.0); var params = gk.getParams();
console.log(params.windowSize); console.log(params.quantileEps); console.log(params.countEps);
predict(p) → number
Given an input cumulative probability, returns a quantile associated with that probability (e.g. for input 0.5 it will return the median).
Example
var qm = require('qminer');
var gk = new qm.analytics.CountWindowGk({
windowSize: 100 // window 100 elements long
});
gk.partialFit(1.0);
gk.partialFit(2.0);
gk.partialFit(1.0);
gk.partialFit(3.0);
gk.partialFit(2.0);
console.log(gk.predict(0.01)); // prints the first percentile
console.log(gk.predict(0.25)); // prints the first quartile
console.log(gk.predict(0.5)); // prints the median
Parameter
Name | Type | Optional | Description |
---|---|---|---|
p |
number |
|
cumulative probability between 0 and 1 (both inclusive) |
- Returns
-
number
quantile associated with p