API Reference¶

class sdtf.StreamDecisionForest(n_estimators=100, splitter='best', max_features='sqrt', bootstrap=True, n_jobs=None, max_samples=None, n_swaps=1)¶

A class used to represent a naive ensemble of random stream decision trees.

Parameters:

n_estimatorsint, default=100

An integer that represents the number of stream decision trees.

splitter{"best", "random"}, default="best"

The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.

max_features{"sqrt", "log2"}, int or float, default="sqrt"

The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and

round(max_features * n_features) features are considered at each split.

If "sqrt", then max_features=sqrt(n_features).
If "log2", then max_features=log2(n_features).
If None, then max_features=n_features.

bootstrapbool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

n_jobsint, default=None

The number of jobs to run in parallel.

max_samplesint or float, default=None

If bootstrap is True, the number of samples to draw from X to train each base estimator. - If None (default), then draw X.shape[0] samples. - If int, then draw max_samples samples. - If float, then draw max_samples * X.shape[0] samples. Thus,

max_samples should be in the interval (0.0, 1.0].

n_swapsint, default=1

The number of trees to swap at each partial fitting. The actual swaps occur with 1/n_batches_ probability.

Attributes:

estimators_list of sklearn.tree.DecisionTreeClassifier: An internal list that contains all sklearn.tree.DecisionTreeClassifier.
classes_list of all unique class labels: An internal list that stores class labels after the first call to partial_fit.
n_batches_int: The number of batches seen with partial_fit.

Methods

`fit`(X, y[, classes])	Partially fits the forest to data X with labels y.
`partial_fit`(X, y[, classes])	Partially fits the forest to data X with labels y.
`predict`(X)	Performs inference using the forest.

fit(X, y, classes=None)¶

Partially fits the forest to data X with labels y.

Parameters:

Xndarray: Input data matrix.
yndarray: Output (i.e. response data matrix).
classesndarray, default=None: List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.

Returns:

selfStreamDecisionForest: The object itself.

partial_fit(X, y, classes=None)¶

Partially fits the forest to data X with labels y.

Parameters:

Xndarray: Input data matrix.
yndarray: Output (i.e. response data matrix).
classesndarray, default=None: List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.

Returns:

selfStreamDecisionForest: The object itself.

predict(X)¶

Performs inference using the forest.

Parameters:

Xndarray: Input data matrix.

Returns:

major_resultndarray: The majority predictions.

class sdtf.CascadeStreamForest(n_estimators=100, splitter='best', max_features='sqrt', bootstrap=True, n_jobs=None, max_samples=None)¶

A class used to represent a cascading ensemble of stream decision trees.

Parameters:

n_estimatorsint, default=100

An integer that represents the max number of stream decision trees.

splitter{"best", "random"}, default="best"

The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.

max_features{"sqrt", "log2"}, int or float, default="sqrt"

The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and

round(max_features * n_features) features are considered at each split.

If "sqrt", then max_features=sqrt(n_features).
If "log2", then max_features=log2(n_features).
If None, then max_features=n_features.

bootstrapbool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

n_jobsint, default=None

The number of jobs to run in parallel.

max_samplesint or float, default=None

If bootstrap is True, the number of samples to draw from X to train each base estimator. - If None (default), then draw X.shape[0] samples. - If int, then draw max_samples samples. - If float, then draw max_samples * X.shape[0] samples. Thus,

max_samples should be in the interval (0.0, 1.0].

Attributes:

estimators_list of sklearn.tree.DecisionTreeClassifier: An internal list that contains cascading sklearn.tree.DecisionTreeClassifier.

Methods

`fit`(X, y[, classes])	Partially fits the forest to data X with labels y.
`partial_fit`(X, y[, classes])	Partially fits the forest to data X with labels y.
`predict`(X)	Performs inference using the forest.

fit(X, y, classes=None)¶

Partially fits the forest to data X with labels y.

Parameters:

Xndarray: Input data matrix.
yndarray: Output (i.e. response data matrix).
classesndarray, default=None: List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.

Returns:

selfCascadeStreamForest: The object itself.

partial_fit(X, y, classes=None)¶

Partially fits the forest to data X with labels y.

Parameters:

Xndarray: Input data matrix.
yndarray: Output (i.e. response data matrix).
classesndarray, default=None: List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.

Returns:

selfCascadeStreamForest: The object itself.

predict(X)¶

Performs inference using the forest.

Parameters:

Xndarray: Input data matrix.

Returns:

major_resultndarray: The majority predictions.