Nerd Notes

Machine_Learing_and_AI

Multi-label vs Multi-classification

There's a difference:

Multi-classification: a target can have more than two classes. But a sample may only belong to one target class
Multi-label: a target can have more than two classes. But a sample may only belong to more than one target class

Scikit lumps both under the same API - OneVsRestClassifier, the main difference is the format of the training inputs:

https://scikit-learn.org/stable/modules/multiclass.html#one-vs-the-rest

Machine_Learing_and_AI

Correspondence Analysis interpretation

When reviewing the results of correspondence analysis, remember:

The further away from the origin, the more discriminative or distinct the feature
When comparing row to column features, similarity is gauged based on how close the angle between the two is (cosine distance), not based on the distance between points themselves.

For example:

Associated, but not a strong association:

Associated, with a strong association:

PS:

small angles indicate association
90 degree angles indicate no relationship
180 degree angles indicate negative associations

If on the other hand you compare two row features together (or two column features), proximity/distance is a good measure of similarity

Machine_Learing_and_AI

Importance of probability calibration for classifiers

Classifiers in this note refer to:

algorithmic such as SVM or Random Forests. In this case the scikit predict_proba function is sometimes used as a measure of confidence
Neural Networks such as NNs ending with sigmoid layer mapping to class. In this case the output of the sigmoid function is sometimes used as a measure of confidence

PS, in anomaly detection if the output of the confidence is sufficiently low, the datapoint can be considered an anomaly.

Both types of classifiers however sometimes skew the results of their confidence measures (1)(2)

In order to resolve this, probability calibration is sometimes necessary. Basically the output of each classifier's confidence is passed through a regressor that has been trained on the predicted vs actual confidence. In other words:

1. The classifier is trained as usual 2. For each output class, create a regressor (isotonic or normal) 3. For every training sample, pass it through the model, and for each output class record the predicted probability and the actual probability (which is usually 0, or 1) 4. Train the regressors created in step 2 on the data collected in step 3

Machine_Learing_and_AI

The skip-thought encoder

SImilar to skip-gram in your typical word2vec. However in this case the sentence is passed through a RNN encoder, whose output is fed simultaneously into:

a forward thought RNN decoder : predict the next sentence
a backward thought RNN decoder : predict the previous sentence

Machine_Learing_and_AI

Generative Adversarial Networks

General flow:

Generator creates samples from noise. These samples are "fake"
Discriminator classifies real images (1) or fake (0). During a training run discriminator is fed samples of both, and weights updated
Discriminator weights are frozen, and labels are inverted (1 = fake, 0 = real). Generator weights are updated through GAN. The inversion encourages generator to produce better fake images. Training is done through GAN so generator can take advantages of discriminator updated weights in step 2.

Practical example link GAN visualization link

Machine_Learing_and_AI

Python's "DATATABLE" package

The "datatable" package is significantly faster than pandas when dealing with large datasets as well as datasets which do not fit in memory

Machine_Learing_and_AI

Bagging vs Boosting

Bagging creates subsets of the original data (with or without replacement), and trains models on these subsets (the models are subsequently used in an ensemble).

Boosting trains a series of models, each model is fed data that has been weighted according to the results of the previous model. Those data points which were labeled wrongly by the previous model are given a higher weight in the subsequent model in the hopes that this model will classify the data points correctly. The series of models is then combined into an ensemble

Technique	Advantage
Bagging	Better vs Over-fitting
Boosting	Better vs Bias
Both	Better model stability

Machine_Learing_and_AI

Creme: Online algorithms library

Online Algorithms are variants of the more common "batch variants" we typically use which are capable of incrementally learn when presented with new training data points. This means they typically perform slightly less well than batch counterparts, but take a lot less RAM and can be easily re-trained even when in production:

Link: https://creme-ml.github.io/

Machine_Learing_and_AI

Determine feature importance in Random Forests

(SciKit Learn) The "RandomForestClassifier" and the "RandomForestRegressor" both have a useful attribute named "featureimportances". This attribute returns the "weight" or "usefulness" of each feature, which allows for feature reduction. Full example:

https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html#sphx-glr-auto-examples-ensemble-plot-forest-importances-py

Machine_Learing_and_AI

Private deep learning - PySyft

PySyft uses SMPC to enable private machine learning by:

Splitting and encrypting training data between workers such that reconstruction without worker collusion is impossible (your training data is kept private)
Splitting and encrypting the model weights between workers (your model is kept private)
Splitting and encrypting the test / prediction data (the data fed into the model to get predictions is also kept private)

PySyft can be used with keras [1][2][3]

Machine_Learing_and_AI