What is Computer Vision?

Every textbook on computer vision starts with this question, but in reality there is not unique answer. Originally, the study of “computational vision (which we now call computer vision) was motivated by the desire to create artificial systems that mimicked biological (in particular, human) systems to a greater or lesser degree. Indeed, if you look at much of the early literature in computer vision, dating from the late 50’s and early 60’s, you will see a strong tendency to relate computational results to biology, cognitive psychology, and psychophysics.

During the next roughly 20 years, both our understanding of computer vision advanced and the technology for performing computer vision advanced significantly. The 70’s and 80’s saw many results in low-level computer vision: edge and feature detection, image matching methods, segmentation, photometric vision, and so forth. There were also many results in what is sometimes referred to as mid-level vision, particularly computational stereo, optical flow and motion analysis. There was, however, relatively little progress on “high-level” vision problems such as object recognition, despite intensive work in the area.

One interesting aspect of this period is a graduate divorcing of computer vision from biological vision, and the emergence of computer vision as an engineering science. In part, this was driven by the increasing sophistication of mathematical and computational techniques that were brought to bear on the problem, and in part this was a reflection of the growing use of computer vision in industy.

The last 20 years have seen an acceleration of these trends. In the consumer market, the rapid movement toward digital imaging has created huge archives of digital image data. The increasing prevalance of the WEB has added the possibility of mining the surrounding text content to “annotate” the associated images. YouTube and other similar video sites have broad significant video data online. Increasing cheap computational power and cameras have broad advanced digital imaging into the home and the commercial arena in ways that could not have been foreseen even ten years ago. Finally, the commercialization of digital imaging has brought enormous capital to bear on the problem.

At the same time, there have been significant paradigmatic shifts in computer vision, driven in part by breakthroughs on several fronts. Perhaps the most sigificant change has been the move toward “data-driven” approaches to computer vision. For example, the development of new feature-based approaches to matching in large image data bases, and the associated training of feature-based classifiers on these features has transformed certain aspects object recognition from an insoluable problem to a practically solved problem in only a decade.

Some Examples of Applications of Computer Vision

The applications of computer vision and related techniques are broad and growing. Here are but a few examples.

Industrial Inspection

Electronics inspection systems

BrainTech industrial vision-guided robotics

Commercial Broadcasting

Super bowl EyeVision



Video textures

Optical Motion Capture


Pathology Search


Claron Technology Inc: Tracking/Visualization



Optical Character Recognition

User interface

Face Detection and Recognition

Toyota Night View Pedestrian Detection

Eagle Eye Tennis Ball tracker

Incogna Image Search

The Challenges of Computer Vision

Computer Vision will never be a “solved” problem. Indeed, as our understanding of the field and its implications grows, so do the areas of investigation. Here are a few examples of areas where there is currently a great deal of activity.

Engineering Challenges

As vision moves more and more into the application realm, the engineering development for vision systems is becoming a larger issue. There are, however, a variety of challenges confronting the designer of a vision-based system. Here are a few:

  1. Lack of Toolkits: It is interesting to contrast computer vision with graphics, which arguably made a similar transition about 25 years ago as high resolution screens and GUIs become commonplace. A graphics designer now has access to unbelievable technology (e.g. GPUs) supported by extremely advanced software libraries. With relatively little graphics knowledge and a few lines of code, it is possible to create incredible images and animations. Toolkits for vision, e.g. OpenCV, are just beginning to emerge, though some specialized toolkits, e.g. for inspection, have been widely commercialized.
  2. Lack of Established Paradigms: It is also interesting to contrast vision with Speech, which has also made great progress in the past two decades. A modern speech system consists of many components: basic signal processing, trained statistical models, lexicons, interpretation rules, and so forth. Many of these components are relatively well-understand and together they form a complex pipeline of processing. Engineering work is largely “tweaking” this pipeline in the right way for the application in question. In some areas of vision, e.g. inspection, there are comparable pipelines, but in general the overall architecture of vision systems remains to be developed.
  3. Lack of Established Performance Metrics: How well should a vision system be expected to work? To be practical, any system must work with high reliability. A designer of a control system, or a mechanical device, or a large software system understands how to create systems with such high levels of reliability. Vision systems are not yet well understood enough that similar guarantees can be reliably designed toward. Thus, building vision systems is still a “high risk” business.

Beyond this, there are still computational challenges to developing vision systems, testing challenges (in particular, defining a range of acceptable testing situations), and so forth. That being said, it is also quite impressive what can be achieved with the engineering tools available, and these tools will only improve.

Conceptual Challenges

While there are now components of a vision system that are becoming well understood (e.g. computational stereo or feature matching), the inter-relationships among these components is still a huge stumbling block to computer vision. In human vision, it is clear, e.g. from optical illusions, that we rely on multiple cues and strong structural priors to interpret imagery. Indeed, there are many cases where humans can make sense of images which are full of distraction, noise, clutter, odd lighting, and so forth. How we do so remains a mystery.

Another large conceptual challenge in imagery is an underlying structure for even describing image content. When we see an image, we can make judgements about specific objects in the scene, how they relate to one another, what they imply and so forth. For example, in an urban scene, we can spot cars, people, signs and so forth. We infer roads from several cues — cars, buildings, traffic markings. But, we can also abstract and discern if this is a modern scene or from the past. We can tell if it is a large city or a small town. We can guess time of day, season of year. We can spot the unusual — why is there a tiger on the street? We can even see the image as part of a larger picture or longer story. How can we link these interpretations computationally? What is the background model we bring to the task, and what are the relationships we need to infer to relate to these models.

There are numerous more specific conceptual challenges. We will encounter several during this course.

Comments are closed.