API Reference¶
- class sdtf.StreamDecisionForest(n_estimators=100, splitter='best', max_features='sqrt', bootstrap=True, n_jobs=None, max_samples=None, n_swaps=1)¶
A class used to represent a naive ensemble of random stream decision trees.
- Parameters:
- n_estimatorsint, default=100
An integer that represents the number of stream decision trees.
- splitter{"best", "random"}, default="best"
The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.
- max_features{"sqrt", "log2"}, int or float, default="sqrt"
The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and
round(max_features * n_features) features are considered at each split.
If "sqrt", then max_features=sqrt(n_features).
If "log2", then max_features=log2(n_features).
If None, then max_features=n_features.
- bootstrapbool, default=True
Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
- n_jobsint, default=None
The number of jobs to run in parallel.
- max_samplesint or float, default=None
If bootstrap is True, the number of samples to draw from X to train each base estimator. - If None (default), then draw X.shape[0] samples. - If int, then draw max_samples samples. - If float, then draw max_samples * X.shape[0] samples. Thus,
max_samples should be in the interval (0.0, 1.0].
- n_swapsint, default=1
The number of trees to swap at each partial fitting. The actual swaps occur with 1/n_batches_ probability.
- Attributes:
- estimators_list of sklearn.tree.DecisionTreeClassifier
An internal list that contains all sklearn.tree.DecisionTreeClassifier.
- classes_list of all unique class labels
An internal list that stores class labels after the first call to partial_fit.
- n_batches_int
The number of batches seen with partial_fit.
Methods
fit
(X, y[, classes])Partially fits the forest to data X with labels y.
partial_fit
(X, y[, classes])Partially fits the forest to data X with labels y.
predict
(X)Performs inference using the forest.
- fit(X, y, classes=None)¶
Partially fits the forest to data X with labels y.
- Parameters:
- Xndarray
Input data matrix.
- yndarray
Output (i.e. response data matrix).
- classesndarray, default=None
List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.
- Returns:
- selfStreamDecisionForest
The object itself.
- partial_fit(X, y, classes=None)¶
Partially fits the forest to data X with labels y.
- Parameters:
- Xndarray
Input data matrix.
- yndarray
Output (i.e. response data matrix).
- classesndarray, default=None
List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.
- Returns:
- selfStreamDecisionForest
The object itself.
- predict(X)¶
Performs inference using the forest.
- Parameters:
- Xndarray
Input data matrix.
- Returns:
- major_resultndarray
The majority predictions.
- class sdtf.CascadeStreamForest(n_estimators=100, splitter='best', max_features='sqrt', bootstrap=True, n_jobs=None, max_samples=None)¶
A class used to represent a cascading ensemble of stream decision trees.
- Parameters:
- n_estimatorsint, default=100
An integer that represents the max number of stream decision trees.
- splitter{"best", "random"}, default="best"
The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.
- max_features{"sqrt", "log2"}, int or float, default="sqrt"
The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and
round(max_features * n_features) features are considered at each split.
If "sqrt", then max_features=sqrt(n_features).
If "log2", then max_features=log2(n_features).
If None, then max_features=n_features.
- bootstrapbool, default=True
Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
- n_jobsint, default=None
The number of jobs to run in parallel.
- max_samplesint or float, default=None
If bootstrap is True, the number of samples to draw from X to train each base estimator. - If None (default), then draw X.shape[0] samples. - If int, then draw max_samples samples. - If float, then draw max_samples * X.shape[0] samples. Thus,
max_samples should be in the interval (0.0, 1.0].
- Attributes:
- estimators_list of sklearn.tree.DecisionTreeClassifier
An internal list that contains cascading sklearn.tree.DecisionTreeClassifier.
Methods
fit
(X, y[, classes])Partially fits the forest to data X with labels y.
partial_fit
(X, y[, classes])Partially fits the forest to data X with labels y.
predict
(X)Performs inference using the forest.
- fit(X, y, classes=None)¶
Partially fits the forest to data X with labels y.
- Parameters:
- Xndarray
Input data matrix.
- yndarray
Output (i.e. response data matrix).
- classesndarray, default=None
List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.
- Returns:
- selfCascadeStreamForest
The object itself.
- partial_fit(X, y, classes=None)¶
Partially fits the forest to data X with labels y.
- Parameters:
- Xndarray
Input data matrix.
- yndarray
Output (i.e. response data matrix).
- classesndarray, default=None
List of all the classes that can possibly appear in the y vector. Must be provided at the first call to partial_fit, can be omitted in subsequent calls.
- Returns:
- selfCascadeStreamForest
The object itself.
- predict(X)¶
Performs inference using the forest.
- Parameters:
- Xndarray
Input data matrix.
- Returns:
- major_resultndarray
The majority predictions.