analytics. Gk
Source: analyticsdoc.
Greenwald - Khanna algorithm for online quantile estimation. Given a comulative probability p, the algorithm returns the approximate value of the p-th quantile.
The algorithm works by keeping a summary of buckets, each summarizing a range of values. Through the run of the algorithm new buckets are created and periodically merged if possible.
It is was first explained in: "Space-Efficient Online Computation of Quantile Summaries" http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf
The error is bounded by the rank of the output element (not by the absolute value). Specifically, the worst case error in rank is bounded by eps*n, where n is the number of elements in the summary.
new Gk([arg])
Example
// import modules
var qm = require('qminer');
var fs = require('qminer').fs;
var quants = qm.analytics.quantiles;
// create the Gk object
var gk = new quants.Gk({
eps: 0.001,
autoCompress: true
});
// create the data used for calculating quantiles
var inputs = [10, 1, 2, 8, 9, 5, 6, 4, 7, 3];
// fit the model
for (var i = 0; i < inputs.length; i++) {
gk.insert(inputs[i]);
}
// make the estimation for the 0.1 quantile
var quant = gk.quantile(0.1);
// save the model
gk.save(fs.openWrite('gk.bin')).close();
// open the gk model under a new variable
var gk2 = new quants.Gk(fs.openRead('gk.bin'));
Parameter
Name | Type | Optional | Description |
---|---|---|---|
arg |
(module:analytics.quantiles~GkParam or module:fs.FIn) |
Yes |
Construction arguments. There are two ways of constructing:
|
Properties
init
Returns the current size of the algorithms summary in number of tuples.
memory
Returns the models current memory consumption.
samples
Returns the number of samples seen by the model.
size
Returns the current size of the algorithms summary in number of tuples.
Methods
compress(val) → module:analytics.quantiles.Gk
Adds a new value to the summary.
Example
var qm = require('qminer');
var gk = new qm.analytics.quantiles.Gk();
gk.insert(1.0);
gk.insert(2.0);
Parameter
Name | Type | Optional | Description |
---|---|---|---|
val |
number |
|
the value |
- Returns
-
module:analytics.quantiles.Gk
B reference to self
compress()
Manually runs the compression procedure.
- Returns
-
reference to self
kolmogorovStat(distribution) → number
Compares this distribution to dist
and returns the Kolmogorov-Smirnov
statistic:
D_n,m = sup_x|f1(x) - f2(x)|
where f1 and f2 are cumulative distribution function of this distribution and
dist
respectively.
Parameter
Name | Type | Optional | Description |
---|---|---|---|
distribution |
module:analytics.quantiles.quantiles.Gk |
|
the distribution to compare against |
- Returns
-
number
B - the K-S statistic
kolmogorovTest(distribution, alpha) → boolean
Compares this distribution to dist
using the Kolmogorov-Smirnov test with
significance alpha
.
Parameters
Name | Type | Optional | Description |
---|---|---|---|
distribution |
module:analytics.quantiles.quantiles.Gk |
|
the distribution to compare against |
alpha |
number |
|
the statistical significance |
- Returns
-
boolean
B - true if the distributions differ
save(fout) → module:fs.FOut
Saves the objects state into the output stream.
Parameter
Name | Type | Optional | Description |
---|---|---|---|
fout |
|
the output stream |
- Returns
-
module:fs.FOut
B - the output stream
cdf(vals) → (number or Array)
Provided a given value or array of values it returns the corresponding values of the cumulative distribution function.
Example
var qm = require('qminer');
var gk = new qm.analytics.quantiles.Gk({
eps: 0.1
});
gk.insert(1.0);
gk.insert(2.0);
gk.insert(1.0);
gk.insert(3.0);
gk.insert(2.0);
console.log(gk.cdf(0)); // prints the CDF for x = 0
console.log(gk.cdf(2)); // prints the CDF for x = 2
console.log(gk.cdf(4)); // prints the CDF for x = 4
Parameter
Name | Type | Optional | Description |
---|---|---|---|
vals |
(number or Array) |
|
the values which we a querying (quantiles) |
- Returns
-
(number or Array)
B pVals - depending whether the input was a single value or array the method returns a probability or array of probabilities
getParams() → module:analytics.quantiles~GkParam
Returns the models' parameters as a JavaScript object (JSON). These parameters are the same as are set through the constructor.
- Returns
-
module:analytics.quantiles~GkParam
B The construction parameters.var analytics = qm.analytics; var gk = new analytics.quantiles.Gk(); var params = gk.getParams();
console.log(params.eps); console.log(params.autoCompress);
quantile(pVals) → (number or Array)
Given an input cumulative probability, returns a quantile associated with that probability (e.g. for input 0.5 it will return the median).
Example
var qm = require('qminer');
var gk = new qm.analytics.quantiles.Gk({
eps: 0.1
});
gk.insert(1.0);
gk.insert(2.0);
gk.insert(1.0);
gk.insert(3.0);
gk.insert(2.0);
console.log(gk.quantile(0.01)); // prints the first percentile
console.log(gk.quantile(0.25)); // prints the first quartile
console.log(gk.quantile(0.5)); // prints the median
Parameter
Name | Type | Optional | Description |
---|---|---|---|
pVals |
(number or Array) |
|
the p-values which we a querying |
- Returns
-
(number or Array)
B quantiles - depending whether the input was a single value or array the method returns a quantile or array of quantiles