taking a graduate class during my undergrad

My last term on campus with CS 784 (Computational Linguistics)

Published

28 April 2025

In my last term on campus at Waterloo, I decided to dedicate the term to learn about machine learning. I had already taken Waterloo’s computer vision class (CS 484) in second year, and felt like I should do a deeper dive to learn more of the fundamentals. In particular, I wanted to get a deeper understanding of machine learning, artificial intelligence and natural language processing. Turns out, I was in luck! I could take:

CS 480: Introduction to Machine Learning with Prof. Yaoliang Yu
CS 486: Introduction to Artificial Intelligence with Prof. Jesse Hoey and Prof. Victor Zhong
CS 784: Computational Linguistics with Prof. Freda Shi

This already sounds like a pretty heavy course load, but I decided to punish myself even more and take:

CS 454: Distributed Systems with Prof. Samer Al Kiswany
CO 331: Coding Theory with Prof. Alfred Menezes

Truly, I was a glutton for punishment.

CS 784 was also a graduate-level class, and I’d never taken a graduate-level class before. I was concerned that the material would be too advanced for me, but I was pleasantly surprised to learn that math is still math - derivatives don’t change in graduate classes, and neither do expected value or matrix multiplication. In fact, coding theory (CO 331) and my machine learning class (CS 480) had some of the wackiest math I’d ever seen. (Don’t get me wrong, the math was cool: Reed-Solomon codes and BCH codes are fascinating, and abstract algebra is truly powerful, but, man, was it hard).

Instead, Freda’s class had a greater focus on learning NLP and linguistics fundamentals, and then applying that knowledge to review literature and research papers. Our term project was to write a meta-analysis paper on a popular word in NLP, and I chose to write about “attention” (This term gets thrown around a lot, and I wanted to discern fact from fiction).

Learning about the attention mechanism and its implementations was fun, and I also enjoyed learning about its history. For example, I thought that the attention mechanism was created for the transformer architecture (Vaswani et al., 2017), but that credit actually goes to (Bahdanau et al., 2014). Bahdanau’s team used attention in a seq2seq model for machine translation.

If you’re interested in learning more about the state of attention as of April 28, 2025, I hope you can find something useful from my paper! This is the first machine learning paper I’ve ever written, and I hope to write something later that can push the field even a smidge forward.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. https://arxiv.org/abs/1706.03762
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. https://arxiv.org/abs/1409.0473