August 26th, 2016
Joe Ellis ’17SEAS and Dan Mozoroff ’17GSAS are two rogue Columbia PhDs building vidRovr, a multimodal computer vision and machine learning system that can ingest hours of video, understand the content, and index the videos accordingly.
It started with a prototype, called NewsRover, aimed at finding some meaning in the madness of television. It wasn’t supposed to be a company, just a side project for the new PhDs to work across disciplines (one was studying machine learning and computer vision, the other neuroscience).
“NewsRover was effectively a system trying solve the problem that television news viewership has been steadily declining for the past ten years,” Joe says. “We wanted to build a system that did aggregation, understanding, and searchability for television. We built and recorded a hundred hours of television news a day, directly from cable, took an hour or thirty-minute program and chopped that up into coherent topic segments.”
Search is one of the most well-developed technologies the industry has—but only when constrained to text. Video, which is encroaching on more and more of text’s monopoly, is still quite arduous to accurately surface. And so after building an algorithm for doing just that, Joe and Dan realized the potential they were sitting on top of. What began as an experiment in computer vision could be a very valuable processing tool for large collections of video. REST OF STORY.