Home | mlexps

Hello!

mlexps is essentially a collection of jupyter notebooks conducting machine learning experiments (hence the name), with a focus on exploring strange setups, pushing neural networks to the breaking point (and thus revealing their weaknesses), demystifying them and discussing what can be improved.

The motivation

Neural networks are quite mysterious beasts. They converge most of the time, until they don't. On some datasets, they do wonders and on others, they fall far behind stuff like XGBoost or just straight up can't learn. What is the typical solution when one get stuck? Read up on the literature, try out a bunch of things like lowering the learning rate, changing network's size, using a different optimizer, etc. Basically, relying on luck. In my view, this is completely unacceptable.

There need to be a clear, first-principles approach to figuring out why something failed. Why did my electronic circuit fail? Let's run a quick simulation, grab the voltages, an oscilloscope, cross check and figure out what went wrong. Why did my liquid rocket engine explode? Let's grab the chamber pressure measurements, see if there is a green flash indicating liquid copper or not, and rerun CFD workflows.

Likewise, why aren't people intercepting gradient flows (a lot of analysis around function-9-linear-vs-skip), see if they're abnormally small or large. If someone did it earlier, Kaiming initilization would have been discovered a lot sooner. Why aren't people trying to create a bunch of networks with different hyperparameters, comparing performance and fit simple equations describing relationships between hyperparams. If people do this more often, EfficientNet and natural language scaling laws could have been discovered earlier.

Basically, lots of avenues, tiny unit tests and analysis techniques that would be very valuable in debugging your networks, and yet most people rely on luck. Hopefully, this will change soon, and these are my jupyter notebooks to hope drive that change.

Besides strange tests, I also experimented on a lot of architectures here, from classical ones like MLP and CNNs, to GANs, VAEs, ViTs, Transformers, YOLO, graph networks, collaborative filtering and diffusion models. I'd say it's pretty diverse. Over time, I explored a lot more topics than just deep learning alone. This includes bioinformatics, molecular biology, quantum chemistry, signal processing, electrical engineering, benchmarking, search engines, economy and population simulations. I'm just really curious about how everything works.

Dependencies

These experiments usually heavily use k1lib, a library I made that streamlines a lot of the analysis. Features include isolating subnetwork, adding callbacks that execute at specific points in time, benchmarking tools (measures time complexity, memory footprint, etc.) and in general lots of quality-of-life tools. The cli submodule in particular is quite powerful, as you can transform high dimensional irregular data structures with relative ease. All of these notebooks are unit-tested for the latest version of the library, to ensure consistency.

They also use k1a. This provides significant acceleration to k1lib by compiling hot code down to C.

Site last compiled time: N/A

Compiled with k1lib version: N/A

Compiled with k1a version: N/A

Total notebooks: N/A

If what you're looking after is figuring out how PyTorch is implemented underneath, a while ago I made these posts. More information about myself can be found here. Feel free to reach out regarding anything.

Interactive demos

Some notebooks have an interactive demo builtin that allows you to quickly explore what that notebook is doing all inside a browser, so this is a collection of all of them. Sometimes the server goes down, so I made a separate page tracking whether the demos are up or not

Sections summary

Timelines

I started out with "function" section, in which I just want to grasp the basics of how to build these networks, how to scale them, how they behave, how to debug them and whatnot. Then, I experimented a lot with CNNs, in sections such as "imagenet", "yolo", "gan", "celeb-a" and "sas-4". After that, I also experimented with attention, ViTs transformers, in sections like "imagenet", "blender", "char-rnn" and "lm". Section "char-rnn" includes some of my earliest work, and "lm" the latest work, with a lot of advanced experiments on text. Meanwhile, I was working on non-DL-related content, particularly chemistry and biology related (sections "mo", "uv-vis", "genome", "chem"), and did a lot of benchmarks in section "other". A quick timeline of all sections and notebooks:

The vertical line you see near the end of 2022 was when I started tracking the history of all notebooks, after running integration tests on them and having to change minor things. but just changing something minor will change the last updated time, which effectively erases history. I realized this and built a system to track them, but the damage was done, hence the thick line.

Datasets

These are all datasets that are used, with some scouting/filtering notebooks and loading/training notebooks available for each dataset, to navigate notebooks faster. Some of these are datasets I collect on my own, and have not setup a system where you can just download it. Contact me at 157239q@gmail.com if you want them.

Dataset name	Type/description	Filtering notebooks	Loading notebooks	External docs
imagenet	Image classification	imagenet/2-filtering	imagenet/3-vanilla, gan/8-progressive-deepening	https://www.image-net.org/
yolo	Bounding box detection	yolo/1-scout
mnist	Image classification		gan/3-mnist-vae	http://yann.lecun.com/exdb/mnist/
Best artworks	Images by famous artists	gan/4-style-transfer	gan/4-style-transfer	Kaggle website
Celeb A	Images of famous people	celeb-a/1-scout	celeb-a/3-lossF
Names in different languages	Text classification	char-rnn/1-preprocess	char-rnn/3-vanilla	PyTorch docs
AG-NEWS	Paragraph classification	lm/1-db-scout	lm/2-classification-bow	PyTorch docs
movielens	Recommender systems	movielens/1-scout	movielens/2-memory	https://movielens.org/
(self curated) protein structures	mmCIF structures	mo/5-filtering	mo/7-dist-matrix-recon
Gene ontology	Knowledge graph	genome/5-gene-ontology	genome/5-gene-ontology	http://geneontology.org/
Sequence ontology	Knowledge graph	genome/6-sequence-ontology	genome/6-sequence-ontology	http://www.sequenceontology.org/
Cora	Graph dataset	graph-nn/2-cora
Wikidata	Knowledge graph	other/13-wikidata		https://www.wikidata.org/

Capabilities achieved

These are a list of capabilities/goals achieved on the specific notebook.

Notebook	Description
function/1-bs-loss	Fit predefined function
imagenet/3-vanilla	Image classification
imagenet/7-tsne	t-SNE transformation
gan/4-style-transfer	Style transfer
gan/7-denoising	Image denoising
gan/8-progressive-deepening	Progressive deepening
gan/9-vae-interface	Image generation
celeb-a/3-lossF	Contrastive learning
sas4/2-reconstruct-scene	2d scene reconstruction
char-rnn/3-vanilla	Word classification using attention
lm/2-classification-bow	Paragraph classification
lm/3-embedding	Word embeddings
movielens/2-memory	Recommender systems
mo/6-dist-matrix-recon	3d structure reconstruction given distogram
chem/1-titration	Titration simulation
other/2-circuits	Circuit simulation
other/13-wikidata	Building search index
other/18-page-rank	Ranking pages

Foundational/impressive notebooks

These are a list of important/impressive/information-dense notebooks, categorized into multiple types:

Insight: really explores certain aspects of deep learning. Some provides really deep insights into things
Correlation: try to deviate some hyperparameter, see the resulting performance, like scaling laws
Grid scan: kinda like correlation, but more complicated/tests multiple hyperparameters at the same time
Search interface: includes interactive interfaces that can search for a variety of data types

Notebook	Type	Description
function/11-skip-really-deep	Insight	Very complex dynamics in super deep networks captured and analyzed
function/19-cripple	Insight	Cripples/attenuates network in the middle, see what happens
function/20-missing-chunks	Insight	Sometimes throw out entire layers, see if bypass completely
function/21-freeze-subnetwork	Insight	Freeze parts of the network, see if it can still extract signals
gan/2-function-vae	Insight/grid scan	Compression efficiency of variational autoencoders
gan/6-multi-agent	Insight	Tries to have multiple critics going against 1 generator in a GAN
gan/8-progressive-deepening	Insight	Tries to speed up training by slowly increasing input image resolution
gan/9-vae-interface	Insight	Controls a VAE, and built an interface to communicate with pretrained network
gan/10-vae-features	Insight	Tries to use a pretrained Vae
celeb-a/3-lossF	Insight	Solving unbalanced loss in face recognition
lm/2-classification-bow	Insight	Paragraph type classification using bag of words
rl-intro/6-temporal-difference	Insight	Temporal diference vs monte carlo
rl-intro/7-eligibility-traces	Insight/grid scan	Multiple lambdas for $$$TD(\lambda)$$$
mo/3-cif-visualize	Insight	Scouts and visualize covid spike protein
mo/8-dist-matrix-recon-2	Insight	Reconstructs 3d structure from distogram
other/2-circuits	Insight	How to avoid infinities in circuit simulations
function/7-deep-vs-shallow	Correlation	How long can MLP be? (30)
function/18-dataset-size	Correlation	How small can datasets be? (20 samples)
imagenet/6-bn-conv-relu-order	Correlation	What is the correct ordering of batch norm, convolution and relu? (CRB)
imagenet/13-double-descent	Correlation	Network's performance vs model size
other/7-nyquist-sampling	Correlation	Good audio sampling factor
other/9-request-throughput	Correlation	Webserver throughput
other/11-sql-throughput	Correlation	Sql server throughput
other/19-image-loading-throughput	Correlation	How to best load images from disk for training
function/13-skip-wide-depth	Grid scan	Width-depth tradeoff
imagenet/4-sizing	Grid scan	Scaling laws for network width
char-rnn/4-correction-factor	Grid scan	Post softmax attention correction factor
char-rnn/5-attn-sizing	Grid scan	Width-depth tradeoff
char-rnn/6-attn-head	Grid scan	How many attention heads are good?
movielens/3-k-means	Grid scan	k-means under increasing noise
genome/5-gene-ontology	Search interface	Gene ontology search
genome/6-sequence-ontology	Search interface	Sequence ontology search
other/13-wikidata	Search interface	Wikidata knowledge graph