API Reference

class sdtf.StreamDecisionForest(n_estimators=100, splitter='best', max_features='sqrt', bootstrap=True, n_jobs=None, max_samples=None, n_swaps=1)

A class used to represent a naive ensemble of random stream decision trees.

Parameters:
n_estimatorsint, default=100

An integer that represents the number of stream decision trees.

splitter{"best", "random"}, default="best"

The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.

max_features{"sqrt", "log2"}, int or float, default="sqrt"

The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and

round(max_features * n_features) features are considered at each split.

  • If "sqrt", then max_features=sqrt(n_features).

  • If "log2", then max_features=log2(n_features).

  • If None, then max_features=n_features.

bootstrapbool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

n_jobsint, default=None

The number of jobs to run in parallel.

max_samplesint or float, default=None

If bootstrap is True, the number of samples to draw from X to train each base estimator. - If None (default), then draw X.shape[0] samples. - If int, then draw max_samples samples. - If float, then draw max_samples * X.shape[0] samples. Thus,

max_samples should be in the interval (0.0, 1.0].

n_swapsint, default=1

The number of trees to swap at each partial fitting. The actual swaps occur with 1/n_batches_ probability.

Attributes:
estimators_list of sklearn.tree.DecisionTreeClassifier

An internal list that contains all sklearn.tree.DecisionTreeClassifier.

classes_list of all unique class labels

An internal list that stores class labels after the first call to partial_fit.

n_batches_int

The number of batches seen with partial_fit.

Methods

fit(X, y[, classes])

Partially fits the forest to data X with labels y.

partial_fit(X, y[, classes])

Partially fits the forest to data X with labels y.

predict(X)

Performs inference using the forest.

fit(X, y, classes=None)

Partially fits the forest to data X with labels y.

Parameters:
Xndarray

Input data matrix.

yndarray

Output (i.e. response data matrix).

classesndarray, default=None

List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.

Returns:
selfStreamDecisionForest

The object itself.

partial_fit(X, y, classes=None)

Partially fits the forest to data X with labels y.

Parameters:
Xndarray

Input data matrix.

yndarray

Output (i.e. response data matrix).

classesndarray, default=None

List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.

Returns:
selfStreamDecisionForest

The object itself.

predict(X)

Performs inference using the forest.

Parameters:
Xndarray

Input data matrix.

Returns:
major_resultndarray

The majority predictions.

class sdtf.CascadeStreamForest(n_estimators=100, splitter='best', max_features='sqrt', bootstrap=True, n_jobs=None, max_samples=None)

A class used to represent a cascading ensemble of stream decision trees.

Parameters:
n_estimatorsint, default=100

An integer that represents the max number of stream decision trees.

splitter{"best", "random"}, default="best"

The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.

max_features{"sqrt", "log2"}, int or float, default="sqrt"

The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and

round(max_features * n_features) features are considered at each split.

  • If "sqrt", then max_features=sqrt(n_features).

  • If "log2", then max_features=log2(n_features).

  • If None, then max_features=n_features.

bootstrapbool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

n_jobsint, default=None

The number of jobs to run in parallel.

max_samplesint or float, default=None

If bootstrap is True, the number of samples to draw from X to train each base estimator. - If None (default), then draw X.shape[0] samples. - If int, then draw max_samples samples. - If float, then draw max_samples * X.shape[0] samples. Thus,

max_samples should be in the interval (0.0, 1.0].

Attributes:
estimators_list of sklearn.tree.DecisionTreeClassifier

An internal list that contains cascading sklearn.tree.DecisionTreeClassifier.

Methods

fit(X, y[, classes])

Partially fits the forest to data X with labels y.

partial_fit(X, y[, classes])

Partially fits the forest to data X with labels y.

predict(X)

Performs inference using the forest.

fit(X, y, classes=None)

Partially fits the forest to data X with labels y.

Parameters:
Xndarray

Input data matrix.

yndarray

Output (i.e. response data matrix).

classesndarray, default=None

List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.

Returns:
selfCascadeStreamForest

The object itself.

partial_fit(X, y, classes=None)

Partially fits the forest to data X with labels y.

Parameters:
Xndarray

Input data matrix.

yndarray

Output (i.e. response data matrix).

classesndarray, default=None

List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.

Returns:
selfCascadeStreamForest

The object itself.

predict(X)

Performs inference using the forest.

Parameters:
Xndarray

Input data matrix.

Returns:
major_resultndarray

The majority predictions.