One project I worked on during my last quarter at Stanford was an unambiguous caption generator. This was a final project for CS224n: Natural Language Processing with Deep Learning. The task we chose was to generate a caption that describes a target image, but not a distractor image.
As an example, let's say we were given an image of a small, brown dog running through a field as the target and an image of a small, brown dog sitting on a couch as the distractor. An ambiguous caption would read "a small, brown dog," as that could describe either image. Our goal was to generate an unambiguous caption, such as "a dog running through a field," which describes only the target image.
The diagram above shows our model and you can read more about Google's work on this. This project was particularly rewarding since it combined several areas of interest for me: image processing, language processing, deep learning, and working with very large datasets. Though we didn't have time to run every experiment we wanted to, this was easily one of the most engaging and challenging projects I've been a part of.
Below is the presentation poster from my final project in EE367: Computational Imaging and Display. Many projects I worked on had machine learning aspects, but this one was purely image processing. Our task was to simulate fog or smoke in an image, which we did by calculating a depth map from the disparity between two shifted versions of the same scene. This type of scene rendering is essential for reliable simulations, autonomous vehicles, and virtual/augmented reality. This project was particularly fun because it was so flexible — once the depth map was calculated we were able to experiment with other techniques, such as image refocus and bokeh effect.
I designed and wrote this website from scratch, to practice using HTML and CSS and to try my hand at something a bit more creative!