Tombone's Computer Vision Blog: linux

Monday, January 06, 2014

You asked, we listened. VMX will be available to run locally.

The following post is a result of my team launching a Kickstarter campaign two weeks ago and upgrading one of our rewards based on all the feedback we received from backers and potential backers. We initially intended to launch the VMX project as a service meaning that it would only run over an internet connection to our serves. But there were scenarios where this was not appropriate. Some people didn't have a fast enough internet connection at home, some people were worried that it would be too expensive to use our product, and some people couldn't use software which required an internet connection at work. The VMX Project, our flagship computer vision in-the-browser software, will not run using a local object detection server.

VMX Project: Computer Vision for Everyone

(Cross-posted from VMX Project Kickstarter January 5, 2013 update and post on blog.vision.ai )

Over the last few weeks, we've listened to many backers (and potential backers) talk about our technology and would like to thank everyone who gave us valuable feedback. Many of you didn’t like VMX being offered only as a service (requiring an internet connection), so we decided to offer a local VMX installation in addition to making VMX available as a service. We didn’t anticipate such great demand for VMX running locally on people’s own computers and networks, but we are dedicated to letting developers have an exceptional computer vision experience and are eager to give our users what they want.

Once the early-access period (March 2014 - June 2014) is over, VMX developers will have the option to receive a single-machine VMX license and install VMX on their own computer. With VMX running on your computer, you won’t have to worry about running out of VMX Compute Hours, accidentally making your data public, and most importantly: it won’t require an internet connection. You will also have the option of communicating between VMX running on your computer and our servers. You will be able to download object detectors, download the models you create during the early-access period, as well as back-up your object models and import them into the VMX as-a-service servers.

During our official launch in Summer 2014, a single-machine VMX license will be available to VMX Developers for $100. Kickstarter backers will be able to simply trade-in 100 of their VMX Compute Hours to obtain one single-machine license and download the software for their own use.

The local VMX software will be installable directly on a computer running Linux. For VMX developers running MS Windows or Apple OS X, we will provide a Linux Virtual Image for download which will contain a pre-installed, and fully configured instance of VMX.

We hope this will make all VMX users more excited about our technology.

--the guys from VISION.AI

Wednesday, January 20, 2010

Heterarchies and Control Structure in Image Interpretation

Several days ago I was reading one of Takeo Kanade's classic computer vision papers from 1977 titled "Model Representation and Control Structure in Image Understanding" and I came across a new term, heterarchy. I think motivating this concept is as important as its definition. At the representational level, Kanade does a good job at advocating the use of multiple levels of representation -- from pixels to patches to regions to subimages to objects.

In addition to discussing the representational aspects of image understanding systems, Kanade analyzes different strategies for using knowledge in such systems (he uses the term control structure to signify the overall flow of information between subroutines). On one extreme is pass-oriented processing (this is Kanade's term -- I prefer to use the terms feed-forward or bottom-up) which relies on iteratively building higher levels of interpretation from lower ones. Marr's vision pipeline is mostly bottom-up, but that discussion will be left for another post. Another extreme is top-down processing, where the image is analyzed in a global-to-local fashion. Of course, as of 2010 these ideas are being used on a regular basis in vision. One example is the paper Learning to Combine Bottom-Up and Top-Down Segmentation by Levin and Weiss.

Kanade acknowledges that the flow of a vision algorithm is very much dependent on the representation used. For image understanding, bottom-up as well as top-down processing will both be critical components of the entire system. However the exact strategy for combining these processes, in addition to countless other mid-level stages, is not very clear. Directly quoting Kanade, "The ultimate style would be a heterarchy, in which a number of modules work together like a community of experts with no strict central executive control." According to this line of thought, processing would occur in a loopy and cooperative style. Kanade attributes the concept of a heterarchy to Patrick Winston who worked with robots in the golden days of AI at MIT. Like Kanade, Winston criticizes a linear flow of information in scene interpretation (this criticism dates back to 1971). The basic problem outlined by both Kanade and Winston is that modules such as line-finders and region-finders (think segmentation) are simply not good enough to be used in subsequent stages of understanding. In my own research I have used the concept of multiple image segmentations to bypass some of the issued with relying on the output of low/mid -level processing for high-level processing. In 1971 Winston envisioned an algorithmic framework that is a melange of subroutines -- a web of algorithms created by different research groups -- that would interact and cooperate to understand an image. This is analogous to the development of an operating system like Linux. There is no overall theory developed by a single research group that made Linux a success -- it is the body of hackers and engineers that produced a wide range of software products that make using Linux a success.

Unfortunately given the tradition of computer vision research, I believe that an open-source-style group effort in this direction will not come out of university-style research (which is overly coupled with the publishing cycle). It would be a noble effort, but would more of a feat of engineering and not science. Imagine a group of 2-3 people creating an operating system from scratch -- it seems like a crazy idea in 2010. However, computer vision research is often done in such small teams (actually there is often a single hacker behind a vision project). But maybe going open-source and allowing several decades of interaction will actually produce usable image understanding systems. I would like to one day lead such an effort -- being both the theoretical mastermind as well as the hacker behind this vision. I am an INTJ, hear me roar.

Monday, January 06, 2014

You asked, we listened. VMX will be available to run locally.

Wednesday, January 20, 2010

Heterarchies and Control Structure in Image Interpretation

Subscribe To