Neural networks are quite mysterious beasts. They converge most of the time, until
they don't. On some datasets, they do wonders and on others, they fall far behind stuff
like XGBoost or just straight up can't learn. What is the typical solution when one get
stuck? Read up on the literature, try out a bunch of things like lowering the learning
rate, changing network's size, using a different optimizer, etc. Basically, relying on
luck. In my view, this is completely unacceptable.
There need to be a clear, first-principles approach to figuring out why something failed.
Why did my electronic circuit fail? Let's run a quick simulation, grab the voltages, an
oscilloscope, cross check and figure out what went wrong. Why did my liquid rocket engine
explode? Let's grab the chamber pressure measurements, see if there is a green flash indicating
liquid copper or not, and rerun CFD workflows.
Likewise, why aren't people intercepting gradient flows (a lot of analysis around
function-9-linear-vs-skip),
see if they're abnormally small or large. If someone did it earlier, Kaiming initilization
would have been discovered a lot sooner. Why aren't people trying to create a bunch of
networks with different hyperparameters, comparing performance and fit simple equations
describing relationships between hyperparams. If people do this more often, EfficientNet
and natural language scaling laws could have been discovered earlier.
Basically, lots of avenues, tiny unit tests and analysis techniques that would be very
valuable in debugging your networks, and yet most people rely on luck. Hopefully, this
will change soon, and these are my jupyter notebooks to hope drive that change.
Besides strange tests, I also experimented on a lot of architectures here, from classical
ones like MLP and CNNs, to GANs, VAEs, ViTs, Transformers, YOLO, graph networks, collaborative
filtering and diffusion models. I'd say it's pretty diverse. Over time, I explored a lot more
topics than just deep learning alone. This includes bioinformatics, molecular biology, quantum
chemistry, signal processing, electrical engineering, benchmarking, search engines, economy
and population simulations. I'm just really curious about how everything works.