module

analytics

Analytics module.

Example

// import modules
var qm = require('qminer');
var analytics = qm.analytics;
// load dataset, create model, evaluate model

Child classes

BiasedGk([arg])
BufferedTDigest([arg])
DpMeans([arg])
Gk([arg])
KMeans([arg])
LogReg([arg])
MDS([arg])
NearestNeighborAD([arg])

NNet([arg])
OneVsAll([arg])
PCA([arg])
PropHazards([arg])
RecLinReg(arg)
RecommenderSys([arg])
RidgeReg([arg])
Sigmoid([arg])

SVC([arg])
SVR([arg])
TDigest([arg])
ThresholdModel([arg])
Tokenizer([arg])
ActiveLearner([arg])

Namespaces

metrics

preprocessing

Method

nmf(mat, k[, json])

Abstract types

ActiveLearnerParam
BiasedGkParam
BufferedDigestParam
detectorParam
DpMeansExplain
DpMeansParam
GkParam
hazardModelParam

KMeansExplain
KMeansParam
logisticRegParam
MDSParam
NearestNeighborADExplain
NearestNeighborADFeatureContribution
nnetParam
oneVsAllParam

PCAParam
recLinRegParam
RecSysParam
ridgeRegParam
SVMParam
TDigestParam
tokenizerParam

Classes

Namespaces

Method

nmf(mat, k[, json]) → Object

Calculates the non-negative matrix factorization, see: https://en.wikipedia.org/wiki/Non-negative_matrix_factorization.

Examples

Asynchronous function

// import modules
var analytics = require('qminer').analytics;
var la = require('qminer').la;
// create a matrix
var mat = new la.Matrix({ rows: 10, cols: 5, random: true });
// compute the non-negative matrix factorization
analytics.nmfAsync(mat, 3, { iter: 100, tol: 1e-4 }, function (err, result) {
   if (err) { console.log(err); }
   // calculation successful
   var U = result.U;
   var V = result.V;
});

Synchronous function

// import modules
var analytics = require('qminer').analytics;
var la = require('qminer').la;
// create a matrix
var mat = new la.Matrix({ rows: 10, cols: 5, random: true });
// compute the non-negative matrix factorization
var result = analytics.nmf(mat, 3, { iter: 100, tol: 1e-4 });
var U = result.U;
var V = result.V;

Parameters

Name Type Optional Description

mat

(module:la.Matrix or module:la.SparseMatrix)

The non-negative matrix.

number

The reduced rank, e.g. number of columns in matrix U and number of rows in matrix V. Must be between 0 and min(mat.rows, mat.cols).

json

Object

Yes

Algorithm options.

Values in json have the following properties:

Name Type Optional Description

iter

number

Yes

The number of iterations used for the algorithm.

Defaults to 100.

tol

number

Yes

The tolerance.

Defaults to 1e-3.

verbose

boolean

Yes

If false, the console output is supressed.

Defaults to false.

Returns: ObjectB The json object nmfRes containing the non-negative matrices U and V:
nmfRes.U- The module:la.Matrix representation of the matrix U,
nmfRes.V- The module:la.Matrix representation of the matrix V.

Abstract types

inner

ActiveLearnerParam Object

An object used for the construction of module:analytics.ActiveLearner.

Properties

Name Type Optional Description

learner

Object

Yes

Learner parameters

Values in learner have the following properties:

Name Type Optional Description

disableAsserts

boolean

Yes

Disable input asserting

Defaults to false.

SVC

module:analytics~SVMParam

Yes

Support vector classifier parameters.

inner

BiasedGkParam Object

An object used for the construction of module:analytics.quantiles.BiasedGk.

Properties

Name	Type	Optional	Description
targetProb	number	Yes	The probability where the algorithm is most accurate. Its accuracy is determined as epsmax(p, targetProb) when targetProb < 0.5 and epsmax(1-p, 1-targetProb) when targetProb >= 0.5. Higher values of `targetProb` allow for a smaller memory footprint. Defaults to `0.01`.
eps	number	Yes	Parameter which determines the accuracy. Defaults to `0.1`.
compression	string	Yes	Determines when the algorithm compresses its summary. Options are: "periodic", "aggressive" and "manual". Defaults to `"periodic"`.
useBands	boolean	Yes	Whether the algorithm should use the 'band' subprocedure. Using this subprocedure should result in a smaller summary. Defaults to `true`.

inner

BufferedDigestParam Object

An object used for the construction of module:analytics.quantiles.BufferedTDigest.

Properties

Name Type Optional Description

Name	Type	Optional	Description
delta	number	Yes	The number of clusters in the summary is bounded by floor(minClusters) <= clusters < 2*ceil(minClusters) Defaults to `100`.
bufferLen	number	Yes	the size of the buffer is minClustersbufferLenFactor, when the buffer fills it is merged with the summary. Also, the algorithm initializes after seeing minClustersbufferLenFactor examples. Defaults to `1000`.
seed	number	Yes	random seed (values above 1 are deterministic) Defaults to `0`.

delta

number

Yes

The number of clusters in the summary is bounded by floor(minClusters) <= clusters < 2*ceil(minClusters)

Defaults to 100.

bufferLen

number

Yes

the size of the buffer is minClustersbufferLenFactor, when the buffer fills it is merged with the summary. Also, the algorithm initializes after seeing minClustersbufferLenFactor examples.

Defaults to 1000.

seed

number

Yes

random seed (values above 1 are deterministic)

Defaults to 0.

inner

detectorParam Object

An object used for the construction of module:analytics.NearestNeighborAD.

Parameters

Name Type Optional Description

Name	Type	Optional	Description
rate	number	Yes	The expected fracton of emmited anomalies (0.05 -> 5% of cases will be classified as anomalies). Defaults to `0.05`.
windowSize	number	Yes	Number of most recent instances kept in the model. Defaults to `100`.

rate

number

Yes

The expected fracton of emmited anomalies (0.05 -> 5% of cases will be classified as anomalies).

Defaults to 0.05.

windowSize

number

Yes

Number of most recent instances kept in the model.

Defaults to 100.

inner

DpMeansExplain Object

The examplanation returned by module:analytics.KMeans#explain.

Properties

Name	Type	Description
medoidID	number	The ID of the nearest medoids.
featureIDs	module:la.IntVector	The IDs of features, sorted by contribution.
featureContributions	module:la.Vector	Weights of each feature contribution (sum to 1.0).

inner

DpMeansParam Object

An object used for the construction of module:analytics.KMeans.

Properties

Name Type Optional Description

iter

number

Yes

The maximum number of iterations.

Defaults to 10000.

lambda

number

Yes

Maximum radius of the clusters

Defaults to 1.

minClusters

number

Yes

Minimum number of clusters

Defaults to 2.

maxClusters

number

Yes

Maximum number of clusters

Defaults to inf.

allowEmpty

boolean

Yes

Whether to allow empty clusters to be generated.

Defaults to true.

calcDistQual

boolean

Yes

Whether to calculate the quality measure based on distance, if false relMeanCentroidDist will return 'undefined'

Defaults to false.

centroidType

string

Yes

The type of centroids. Possible options are 'Dense' and 'Sparse'.

Defaults to "Dense".

distanceType

string

Yes

The distance type used at the calculations. Possible options are 'Euclid' and 'Cos'.

Defaults to "Euclid".

verbose

boolean

Yes

If false, the console output is supressed.

Defaults to false.

fitIdx

Array of number

Yes

The index array used for the construction of the initial centroids.

fitStart

Object

Yes

The KMeans model returned by module:analytics.KMeans.prototype.getModel used for centroid initialization.

Values in fitStart have the following properties:

Name	Type	Optional	Description
C	(module:la.Matrix or module:la.SparseMatrix)		The centroid matrix.

inner

GkParam Object

An object used for the construction of module:analytics.quantiles.Gk.

Properties

Name Type Optional Description

Name	Type	Optional	Description
eps	number	Yes	Determines the relative error of the algorithm. Defaults to `0.01`.
autoCompress	boolean	Yes	Whether the summary should be compresses automatically or manually. Defaults to `true`.
useBands	boolean	Yes	Whether the algorithm should use the 'band' subprocedure. Using this subprocedure should result in a smaller summary. Defaults to `true`.

eps

number

Yes

Determines the relative error of the algorithm.

Defaults to 0.01.

autoCompress

boolean

Yes

Whether the summary should be compresses automatically or manually.

Defaults to true.

useBands

boolean

Yes

Whether the algorithm should use the 'band' subprocedure. Using this subprocedure should result in a smaller summary.

Defaults to true.

inner

hazardModelParam Object

An object used for the construction of module:analytics.PropHazards.

Property

Name Type Optional Description

Name	Type	Optional	Description
lambda	number	Yes	The regularization parameter. Defaults to `0`.

lambda

number

Yes

The regularization parameter.

Defaults to 0.

inner

KMeansExplain Object

The examplanation returned by module:analytics.KMeans#explain.

Properties

Name	Type	Description
medoidID	number	The ID of the nearest medoids.
featureIDs	module:la.IntVector	The IDs of features, sorted by contribution.
featureContributions	module:la.Vector	Weights of each feature contribution (sum to 1.0).

inner

KMeansParam Object

An object used for the construction of module:analytics.KMeans.

Properties

Name Type Optional Description

iter

number

Yes

The maximum number of iterations.

Defaults to 10000.

number

Yes

The number of centroids.

Defaults to 2.

allowEmpty

boolean

Yes

Whether to allow empty clusters to be generated.

Defaults to true.

calcDistQual

boolean

Yes

Whether to calculate the quality measure based on distance, if false relMeanCentroidDist will return 'undefined'

Defaults to false.

centroidType

string

Yes

The type of centroids. Possible options are 'Dense' and 'Sparse'.

Defaults to "Dense".

distanceType

string

Yes

The distance type used at the calculations. Possible options are 'Euclid' and 'Cos'.

Defaults to "Euclid".

verbose

boolean

Yes

If false, the console output is supressed.

Defaults to false.

fitIdx

Array of number

Yes

The index array used for the construction of the initial centroids.

fitStart

Object

Yes

The KMeans model returned by module:analytics.KMeans.prototype.getModel used for centroid initialization.

Values in fitStart have the following properties:

Name	Type	Optional	Description
C	(module:la.Matrix or module:la.SparseMatrix)		The centroid matrix.

inner

logisticRegParam Object

An object used for the construction of module:analytics.LogReg.

Properties

Name Type Optional Description

Name	Type	Optional	Description
lambda	number	Yes	The regularization parameter. Defaults to `1`.
intercept	boolean	Yes	Indicates whether to automatically include the intercept. Defaults to `false`.

lambda

number

Yes

The regularization parameter.

Defaults to 1.

intercept

boolean

Yes

Indicates whether to automatically include the intercept.

Name Type Optional Description

Name	Type	Optional	Description
k	number	Yes	Number of eigenvectors to be computed. Defaults to `null`.
iter	number	Yes	Number of iterations. Defaults to `100`.

number

Yes

Number of eigenvectors to be computed.

Defaults to null.

iter

number

Yes

Number of iterations.

Defaults to 100.

inner

recLinRegParam Object

An object used for the construction of module:analytics.RecLinReg.

Parameters

Name Type Optional Description

Name	Type	Optional	Description
dim	number		The dimension of the model.
regFact	number	Yes	The regularization factor. Defaults to `1.0`.
forgetFact	number	Yes	The forgetting factor. Defaults to `1.0`.

dim

number

The dimension of the model.

regFact

number

Yes

The regularization factor.

Defaults to 1.0.

forgetFact

number

Yes

The forgetting factor.

Defaults to 1.0.

inner

RecSysParam Object

An object used for the construction of module:analytics.RecommenderSys.

Properties

Name	Type	Optional	Description
iter	number	Yes	The maximum number of iterations. Defaults to `10000`.
k	number	Yes	The number of centroids. Defaults to `2`.
tol	number	Yes	The tolerance. Defaults to `1e-3`.
verbose	boolean	Yes	If false, the console output is supressed. Defaults to `false`.

inner

ridgeRegParam Object

An object used for the construction of module:analytics.RidgeReg.

Property

Name Type Optional Description

Name	Type	Optional	Description
gamma	number	Yes	The gamma value. Defaults to `0.0`.

gamma

number

Yes

The gamma value.

Defaults to 0.0.

inner

SVMParam Object

SVM constructor parameters. Used for the construction of module:analytics.SVC and module:analytics.SVR.

Properties

Name	Type	Optional	Description
algorithm	string	Yes	The algorithm procedure. Possible options are `'SGD'` and `'LIBSVM'`. `'PR_LOQO'`is not supported anymore. Defaults to `'SGD'`.
c	number	Yes	Cost parameter. Increasing the parameter forces the model to fit the training data more accurately (setting it too large may lead to overfitting) . Defaults to `1.0`.
j	number	Yes	Unbalance parameter. Increasing it gives more weight to the positive examples (getting a better fit on the positive training examples gets a higher priority). Setting j=n is like adding n-1 copies of the positive training examples to the data set. Defaults to `1.0`.
eps	number	Yes	Epsilon insensitive loss parameter. Larger values result in fewer support vectors (smaller model complexity) Defaults to `1e-3`.
batchSize	number	Yes	Number of examples used in the subgradient estimation. Higher number of samples slows down the algorithm, but makes the local steps more accurate. Defaults to `1000`.
maxIterations	number	Yes	Maximum number of iterations. Defaults to `10000`.
maxTime	number	Yes	Maximum runtime in seconds. Defaults to `1`.
minDiff	number	Yes	Stopping criterion tolerance. Defaults to `1e-6`.
type	string	Yes	The subalgorithm procedure in LIBSVM. Possible options are `'C_SVC'`, `'NU_SVC'` and `'ONE_CLASS'` for classification and `'EPSILON_SVR'`, `'NU_SVR'` and `'ONE_CLASS'` for regression. Defaults to `'C_SVC'`.
kernel	string	Yes	Kernel type in LIBSVM. Possible options are `'LINEAR'`, `'POLY'`, 'RBF'`, 'SIGMOID'` and `'PRECOMPUTED'`. Defaults to `'LINEAR'`.
gamma	number	Yes	Gamma parameter in LIBSVM. Set gamma in kernel function. Defaults to `1.0`.
p	number	Yes	P parameter in LIBSVM. Set the epsilon in loss function of epsilon-SVR. Defaults to `1e-1`.
degree	number	Yes	Degree parameter in LIBSVM. Set degree in kernel function. Defaults to `1`.
nu	number	Yes	Nu parameter in LIBSVM. Set the parameter nu of nu-SVC, one-class SVM, and nu-SVR. Defaults to `1e-2`.
coef0	number	Yes	Coef0 parameter in LIBSVM. Set coef0 in kernel function. Defaults to `1.0`.
cacheSize	number	Yes	Set cache memory size in MB (default 100) in LIBSVM. Defaults to `100`.
verbose	boolean	Yes	Toggle verbose output in the console. Defaults to `false`.

inner

TDigestParam Object

An object used for the construction of module:analytics.quantiles.TDigest.

Properties

Name Type Optional Description

Name	Type	Optional	Description
minCount	number	Yes	The minimal number of examples before the model is initialized. Defaults to `0`.
clusters	number	Yes	The number of 1-d clusters (large values lead to higher memory usage). Defaults to `100`.

minCount

number

Yes

The minimal number of examples before the model is initialized.

Defaults to 0.

clusters

number

Yes

The number of 1-d clusters (large values lead to higher memory usage).

Defaults to 100.

inner

tokenizerParam Object

An object used for the construction of module:analytics.Tokenizer.

Property

Name Type Optional Description

Name	Type	Optional	Description
type	string	Yes	The type of the tokenizer. The different types are: 1. 'simple' - Creates break on white spaces. 2. 'html' - Creates break on white spaces and ignores html tags. 3. 'unicode' - Creates break on white spaces and normalizes unicode letters, e.g. o?=o?= changes to cso?=z. Defaults to `'unicode'`.

type

string

Yes

The type of the tokenizer. The different types are:
1. 'simple' - Creates break on white spaces.
2. 'html' - Creates break on white spaces and ignores html tags.
3. 'unicode' - Creates break on white spaces and normalizes unicode letters, e.g. o?=o?= changes to cso?=z.

Defaults to 'unicode'.