Machine vision. Coursework: Machine vision

Machine vision. What is it and how to use it? Optical Source Image Processing

Machine vision is a scientific direction in the field of artificial intelligence, in particular robotics, and related technologies for obtaining images of real-world objects, processing them and using the obtained data to solve various kinds of applied problems without (full or partial) human participation.

Historical breakthroughs in machine vision

Vision System Components

  • One or more digital or analog cameras (black and white or color) with suitable optics to capture images
  • Software for producing images for processing. For analog cameras this is an image digitizer
  • Processor (modern PC with a multi-core processor or built-in processor, for example - DSP)
  • Computer vision software that provides tools for developing individual software applications.
  • Input/output equipment or communication channels for reporting findings
  • Smart camera: one device that includes all of the above points.
  • Very specialized light sources (LEDs, fluorescent and halogen lamps, etc.)
  • Specific software applications for image processing and detection of relevant properties.
  • A sensor for synchronizing detection parts (often an optical or magnetic sensor) for image capture and processing.
  • Drives of a certain shape used for sorting or discarding defective parts.
Machine vision focuses on primarily industrial applications, such as autonomous robots and visual inspection and measurement systems. This means that image sensor technology and control theory are associated with processing video data to control the robot, and real-time processing of the resulting data is carried out in software or hardware.

Image processing and image analysis mainly focus on working with 2D images, i.e. how to convert one image to another. For example, pixel-by-pixel operations to increase contrast, operations to highlight edges, remove noise, or geometric transformations such as image rotation. These operations assume that image processing/analysis operates independently of the content of the images themselves.

Computer vision focuses on processing three-dimensional scenes projected onto one or more images. For example, by restoring the structure or other information about a 3D scene from one or more images. Computer vision often depends on more or less complex assumptions about what is represented in images.

There is also a field called visualization, which was originally associated with the process of creating images, but sometimes dealt with processing and analysis. For example, radiography works with the analysis of video data for medical applications.

Finally, pattern recognition is a field that uses various methods to extract information from video data, mainly based on a statistical approach. Much of this field is devoted to the practical application of these methods.

Thus, we can conclude that the concept of “computer vision” today includes: computer vision, visual pattern recognition, image analysis and processing, etc.

Computer vision tasks

  • Recognition
  • Identification
  • Detection
  • Text recognising
  • Restoring 3D shape from 2D images
  • Motion estimation
  • Scene restoration
  • Image recovery
  • Identification of structures of a certain type in images, image segmentation
  • Optical Flow Analysis

Recognition


A classic problem in computer vision, image processing, and machine vision is determining whether video data contains some characteristic object, feature, or activity.

This problem can be reliably and easily solved by humans, but has not yet been satisfactorily solved in computer vision in the general case: random objects in random situations.

One or more predefined or learned objects or classes of objects can be recognized (usually along with their two-dimensional position in the image or three-dimensional position in the scene).

Identification


An individual instance of an object belonging to a class is recognized.
Examples: identification of a specific human face or fingerprint or vehicle.

Detection


The video data is checked for a certain condition.

Detection based on relatively simple and fast calculations is sometimes used to find small areas in the analyzed image, which are then analyzed using more resource-intensive techniques to obtain the correct interpretation.

Text recognising


Searching images by content: Finding all images in a large set of images that have content defined in various ways.

Position estimation: Determining the position or orientation of a certain object relative to the camera.

Optical Character Recognition: Recognition of characters in images of printed or handwritten text (usually for translation into a text format most convenient for editing or indexing. For example, ASCII).

Restoring a 3D shape from 2D images is carried out using stereo reconstruction of a depth map, reconstruction of a normal field and a depth map from the shading of a halftone image, reconstruction of a depth map from a texture and determination of a shape from displacement

An example of restoring a 3D shape from a 2D image

Motion estimation

Several motion estimation problems in which a sequence of images (video data) is processed to find an estimate of the speed of each point in the image or 3D scene. Examples of such tasks are: determining three-dimensional camera movement, tracking, that is, following the movements of an object (for example, cars or people)

Scene restoration

Two or more scene images, or video data, are given. Scene reconstruction has the task of recreating a three-dimensional model of the scene. In the simplest case, a model can be a set of points in three-dimensional space. More sophisticated methods reproduce the full three-dimensional model.

Image recovery


The task of image restoration is to remove noise (sensor noise, blur of a moving object, etc.).

The simplest approach to solving this problem is through various types of filters, such as low-pass or mid-pass filters.

Higher levels of noise removal are achieved by first analyzing video data for various structures, such as lines or edges, and then controlling the filtering process based on that data.

Image recovery

Optical flow analysis (finding the movement of pixels between two images).
Several motion estimation problems in which a sequence of images (video data) is processed to find an estimate of the speed of each point in the image or 3D scene.

Examples of such tasks are: determining three-dimensional camera movement, tracking, i.e. following the movements of an object (for example, cars or people).

Image processing methods

Pixel counter

Counts the number of light or dark pixels.
Using a pixel counter, the user can select a rectangular area on the screen at a location of interest, such as where he expects to see the faces of people passing by. The camera will immediately respond with information about the number of pixels represented by the sides of the rectangle.

The pixel counter allows you to quickly check whether a mounted camera meets regulatory or customer pixel resolution requirements, for example for the faces of people entering camera-monitored doors or for license plate recognition purposes.

Binarization


Converts a grayscale image to binary (white and black pixels).
The values ​​of each pixel are conventionally coded as “0” and “1”. The value “0” is conventionally called the background or background, and “1” is the foreground.

Often when storing digital binary images, a bitmap is used, where one bit of information is used to represent one pixel.

Also, especially in the early stages of technology, the two possible colors were black and white, which is not mandatory.

Segmentation

Used to search and/or count parts.

The purpose of segmentation is to simplify and/or change the representation of an image so that it is simpler and easier to analyze.

Image segmentation is commonly used to highlight objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning labels to each pixel in an image such that pixels with the same labels share common visual characteristics.

The result of image segmentation is a set of segments that together cover the entire image, or a set of contours extracted from the image. All pixels in a segment are similar in some characteristic or calculated property, such as color, brightness, or texture. Neighboring segments differ significantly in this characteristic.

Reading Barcodes


Barcode is graphic information applied to the surface, marking or packaging of products, making it readable by technical means - a sequence of black and white stripes or other geometric shapes.
In machine vision, barcodes are used to decode 1D and 2D codes designed to be read or scanned by machines.

Optical Character Recognition

Optical Character Recognition: Automated reading of text such as serial numbers.

OCR is used to convert books and documents into electronic form, to automate business accounting systems, or to publish text on a web page.

OCR allows you to edit text, search for words or phrases, store it in a more compact form, display or print material without losing quality, analyze information, and apply electronic translation, formatting, or conversion to speech to text.

My program written in LabView for working with images

Computer vision was used for non-destructive quality control of superconducting materials.

Introduction. Solving the problems of ensuring comprehensive security (both anti-terrorism and mechanical safety of objects, and technological safety of engineering systems), currently, requires a systematic organization of control over the current state of objects. One of the most promising ways to monitor the current state of objects are optical and optoelectronic methods based on technologies for processing video images of an optical source. These include: programs for working with images; the latest image processing methods; equipment for obtaining, analyzing and processing images, i.e. a set of tools and methods related to the field of computer and machine vision. Computer vision is a general set of techniques that allow computers to see and recognize three- or two-dimensional objects, whether engineering or non-engineering. Working with computer vision requires digital or analog input/output devices, as well as computer networks and IP location analyzers designed to control the production process and prepare information for making operational decisions in the shortest possible time.

Formulation of the problem. Today, the main task for the designed computer vision systems remains the detection, recognition, identification and qualification of potential risk objects located in a random location in the area of ​​operational responsibility of the complex. Currently existing software products aimed at solving the listed problems have a number of significant disadvantages, namely: significant complexity associated with the high detail of optical images; high power consumption and a fairly narrow range of capabilities. Expanding the tasks of detecting objects of potential risk to the area of ​​searching for random objects in random situations located in a random location is not possible with existing software products, even with the use of a supercomputer.

Target. Development of a universal program for processing images of an optical source, with the ability to stream data analysis, that is, the program must be light and fast so that it can be written to a small-sized computer device.

Tasks:

  • development of a mathematical model of the program;
  • writing a program;
  • testing the program in a laboratory experiment, with full preparation and conduct of the experiment;
  • research into the possibility of using the program in related areas of activity.

The relevance of the program is determined by:
  • the high cost of professional visual information processing programs.

Analysis of the relevance of program development.
  • the lack of image processing programs on the software market that provide a detailed analysis of the engineering components of objects;
  • constantly growing requirements for the quality and speed of obtaining visual information, sharply increasing the demand for image processing programs;
  • the existing need for high-performance programs that are reliable and user-friendly;
  • There is a need for high performance programs and simple management, which is extremely difficult to achieve in our time. For example, I took Adobe Photoshop. This graphic editor has a harmonious combination of functionality and ease of use for the average user, but in this program it is impossible to work with complex image processing tools (for example, image analysis by constructing a mathematical relationship (function) or integrated image processing);
  • the high cost of professional visual information processing programs. If the software is of high quality, then the price for it is extremely high, even down to the individual functions of a particular set of programs. The graph below shows the price/quality relationship between simple analogues of the program.

To simplify the solution of problems of this type, I developed a mathematical model and wrote a program for a computer device for image analysis using simple transformations of source images.

The program works with transformations such as binarization, brightness, image contrast, etc. The operating principle of the program is demonstrated using the example of the analysis of superconducting materials.

When creating composite superconductors based on Nb3Sn, the volume ratio of bronze and niobium, the size and number of fibers in it, the uniformity of their distribution over the cross section of the bronze matrix, and the presence of diffusion barriers and stabilizing materials are varied. For a given volume fraction of niobium in a conductor, an increase in the number of fibers leads, accordingly, to a decrease in their diameter. This leads to a noticeable increase in the Nb/Cu-Sn interaction surface, which significantly accelerates the process of growth of the superconducting phase. Such an increase in the amount of the superconducting phase with an increase in the number of fibers in the conductor ensures an increase in the critical characteristics of the superconductor. In this regard, it is necessary to have a tool to control the volume fraction of the superconducting phase in the final product (composite superconductor).

When creating the program, the importance of conducting research into the materials from which superconducting cables are created was taken into account, since if the ratio of niobium to bronze is incorrect, an explosion of the wires is possible, and, consequently, human casualties, monetary costs and loss of time. This program allows you to determine the quality of wires based on a chemical and physical analysis of the object.

Program block diagram


Description of the research stages.

Stage 1. Sample preparation: cutting a composite superconductor on an electrical discharge machine; pressing the sample into a plastic matrix; polishing the sample to a mirror finish; etching the sample to highlight niobium fibers on a bronze matrix. Samples of pressed composite superconducting samples were obtained;

Stage 2. Imaging: obtaining metallographic images using a scanning electron microscope.

Stage 3. Image processing: creation of a tool for determining the volume fraction of the superconducting phase in a metallographic image; a set of statistically significant data on a specific type of sample. Mathematical models of various image processing tools have been created; a software development was created to estimate the volume fraction of the superconducting phase; the program was simplified by combining several mathematical functions into one; the average value of the volume fraction of niobium fibers in the bronze matrix was 24.7±0.1%. A low percentage of deviation indicates a high repeatability of the structure of the composite wire.

Electron microscopy images of composite superconductors

Image processing methods in the program.

  • Identification- an individual instance of an object belonging to a class is recognized.
  • Binarization– the process of converting a color (or grayscale) image into two-color black and white.
  • Segmentation is the process of dividing a digital image into multiple segments (many pixels, also called superpixels).
  • Erosion– a complex process in which a structural element passes through all the pixels of the image. If at some position each single pixel of the structural element coincides with a single pixel of the binary image, then a logical addition of the central pixel of the structural element is performed with the corresponding pixel of the output image.
  • Dilatation- convolution of an image or a selected area of ​​an image with a certain kernel. The core can have any shape and size. In this case, a single leading position is allocated in the kernel, which is combined with the current pixel when calculating the convolution.

Program formulas

Binarization formula (Otsu method):

Erosion formula:

Dilatation formula:

Pattern of dilatation and erosion

Color threshold segmentation formulas:

Determining the brightness gradient module for each image pixel:

Threshold calculation:

Used equipment

Program interface

There are a lot of things in the world that the human eye simply does not have time to follow. For example, in conveyor technology, errors occur precisely because of the human factor. A person is simply not able to soberly evaluate objects after several hours of work. Robots are well suited for this. By using machine vision they can do a detailed check of the product, compare it with a sample and instantly decide on further processing of the product.

How does computer vision work?

Computer vision is the ability of a computer to "see". A machine vision system uses one or more video cameras, an analog-to-digital conversion (ADC) and digital signal processing (DSP) device. The received data goes to a computer or robot controller. Computer vision is similar in complexity to voice recognition.

Two important characteristics in any such system are sensitivity and resolution. Sensitivity is the machine's ability to see in dim light or distinguish faint pulses in the spectrum of invisible wavelengths. Resolution is the degree to which the system distinguishes objects. Sensitivity and resolution are interdependent parameters. As sensitivity increases, resolution usually decreases, and vice versa, although all other factors usually remain unchanged.

Human eyes can detect electromagnetic waves with wavelengths ranging from 390 to 770 nanometers. Video cameras have a much wider range than this. For example, there are machine vision systems that can see in the infrared, ultraviolet and x-ray wavelength regions.

Machine vision is used in various industrial and medical fields:

    Component Analysis

    Signature identification

    Optical Character Recognition

    Handwriting recognition

    Object recognition

    Pattern recognition

    Material control

    Currency control

    Medical image analysis


Machine vision is the application of computer vision to industry and manufacturing. While computer vision is a general set of techniques for allowing computers to see, the area of ​​interest of computer vision as an engineering discipline is digital input/output devices and computer networks designed to monitor manufacturing equipment such as robotic arms or extraction machines. defective products. Machine vision is a subfield of engineering related to computer science, optics, mechanical engineering and industrial automation. One of the most common applications of machine vision is the inspection of industrial products such as semiconductor chips, automobiles, food and pharmaceuticals. People who worked on assembly lines inspected parts of the product, drawing conclusions about the quality of workmanship. Machine vision systems for these purposes use digital and smart cameras, as well as image processing software to perform similar checks.

Machine vision systems are programmed to perform highly specialized tasks, such as counting objects on an assembly line, reading serial numbers, or searching for surface defects. The benefits of a machine vision-based visual inspection system include high speed of operation with increasing turnover, 24-hour operation capability and repeatable measurement accuracy. Also, the advantage of machines over people is the absence of fatigue, illness or inattention. However, humans have fine perception over a short period and greater flexibility in classifying and adapting to search for new defects.

Computers cannot "see" in the same way that humans do. Cameras are not equivalent to the human vision system, and while humans can rely on guesswork and assumptions, machine vision systems must "see" by examining individual pixels in an image, processing them and attempting to draw conclusions using a knowledge base and set of functions such as a device pattern recognition. Although some computer vision algorithms have been developed to mimic human visual perception, a large number of unique methods have been developed to process images and determine relevant image properties.

Machine Vision Applications

The applications of machine vision are varied and cover various areas of activity, including but not limited to the following:

    Large industrial production

    Accelerated production of unique products

    Safety systems in industrial environments

    Control of pre-fabricated objects (e.g. quality control, error investigation)

    Visual control and management systems (accounting, barcode reading)

    Control of automated vehicles

    Quality control and food inspection

In the automotive industry, machine vision systems are used to guide industrial robots and to inspect vehicle paint surfaces, welds, engine blocks, and many other components for defects.

Automatics' early Autovision II computer vision system was demonstrated at a trade show in 1983. The camera on a tripod is pointed down onto a backlit table to produce a clear image on the screen, which is then checked for blobs.

Machine vision is the application of computer vision for industry and production. While computer vision is a general set of techniques for allowing computers to see, the area of ​​interest of computer vision as an engineering discipline is digital input/output devices and computer networks designed to monitor manufacturing equipment such as robotic arms or extraction machines. defective products. Machine vision is a subfield of engineering related to computer science, optics, mechanical engineering and industrial automation. One of the most common applications of machine vision is the inspection of industrial products such as semiconductor chips, automobiles, food and pharmaceuticals. People who worked on assembly lines inspected parts of the product, drawing conclusions about the quality of workmanship. Machine vision systems for these purposes use digital and smart cameras, as well as image processing software to perform similar checks.

Introduction

Machine vision systems are programmed to perform highly specialized tasks, such as counting objects on an assembly line, reading serial numbers, or searching for surface defects. The benefits of a machine vision-based visual inspection system include high speed of operation with increasing turnover, 24-hour operation capability and repeatable measurement accuracy. Also, the advantage of machines over people is the absence of fatigue, illness or inattention. However, humans have fine perception over a short period and greater flexibility in classifying and adapting to search for new defects.

Computers cannot "see" in the same way that humans do. Cameras are not equivalent to the human vision system, and while humans can rely on guesswork and assumptions, machine vision systems must "see" by examining individual pixels in an image, processing them and attempting to draw conclusions using a knowledge base and set of functions such as a device pattern recognition. Although some computer vision algorithms have been developed to mimic human visual perception, a large number of unique methods have been developed to process images and determine relevant image properties.

Vision System Components

Although machine vision is the process of applying computer vision to industrial applications, it is useful to list commonly used hardware and software components. A typical machine vision system solution includes several of the following components:

  1. One or more digital or analog cameras (black and white or color) with suitable optics to capture images
  2. Software for producing images for processing. For analog cameras this is an image digitizer
  3. Processor (modern PC with a multi-core processor or built-in processor, for example - DSP)
  4. Computer vision software that provides tools for developing individual software applications.
  5. Input/output equipment or communication channels for reporting findings
  6. Smart camera: one device that includes all of the above points.
  7. Very specialized light sources (LEDs, fluorescent and halogen lamps, etc.)
  8. Specific software applications for image processing and detection of relevant properties.
  9. A sensor for synchronizing detection parts (often an optical or magnetic sensor) for image capture and processing.
  10. Shaped drives used to sort or discard defective parts.

A timing sensor detects when a part that moves frequently on a conveyor is in a position to be inspected. The sensor triggers the camera to take a picture of the part as it passes under the camera and is often synced with a light pulse to capture a clear image. Lighting used to highlight features is designed to highlight features of interest and hide or minimize the appearance of features that are not of interest (such as shadows or reflections). LED panels of suitable sizes and locations are often used for this purpose.

The image from the camera goes into the frame grabber or into computer memory on systems where the frame grabber is not used. A frame grabber is a digitizing device (either as part of a smart camera or as a separate board in a computer) that converts the output from the camera into a digital format (typically a two-dimensional array of numbers corresponding to the light intensity level of a specific point in the field of view, called pixels ) and places the image in computer memory so that it can be processed using computer vision software.

Software typically takes several steps to process images. Often the image is first processed to reduce noise or convert many shades of gray into a simple combination of black and white (binarization). After initial processing, the program will count, measure and/or identify objects, sizes, defects and other characteristics of the image. As a final step, the program passes or rejects the part according to the specified criteria. If a part is defective, the software signals a mechanical device to reject the part; Another scenario is that the system can stop the production line and alert a human worker to solve the problem and report what led to the failure.

Although most machine vision systems rely on "black and white" cameras, the use of color cameras is becoming more common. In addition, vision systems are increasingly using direct-connect digital cameras rather than cameras with a separate frame grabber, reducing costs and simplifying the system.

Smart cameras with built-in processors are capturing an increasing share of the machine vision market. The use of embedded (and often optimized) processors eliminates the need for a frame grabber card and an external computer, reducing system cost and complexity while providing processing power to each camera. Smart cameras are generally less expensive than systems consisting of a camera, power, and/or an external computer, while increasing the power of the onboard processor and DSP can often achieve comparable or better performance and greater capabilities than conventional PCs. systems.

Processing methods

Commercial and open source computer vision software packages typically include a range of image processing techniques, such as:

  • Pixel Counter: Counts the number of light or dark pixels
  • Binarization: Converts a grayscale image to binary (white and black pixels)
  • Segmentation: used to find and/or count parts
    • Blob Finding and Analysis: Checks an image for individual blobs of connected pixels (such as a black hole on a gray object) as image reference points. These blobs often represent targets for processing, capture, or manufacturing defects.
    • Robust pattern recognition: search for a pattern of an object that may be rotated, partially hidden by another object, or different in size.
  • Barcode reading: decoding 1D and 2D codes designed to be read or scanned by machines
  • Optical character recognition: automated reading of text such as serial numbers
  • Measurement: Measuring the size of objects in inches or millimeters
  • Edge Detection: Finding the edges of objects
  • Pattern matching: searching, matching, and/or counting specific patterns

In most cases, machine vision systems use a sequential combination of these processing methods to perform a complete inspection. For example, a system that reads a barcode can also check the surface for scratches or damage and measure the length and width of the components being processed.

Machine Vision Applications

The applications of machine vision are varied and cover various areas of activity, including but not limited to the following:

  • Large industrial production
  • Accelerated production of unique products
  • Safety systems in industrial environments
  • Control of pre-fabricated objects (e.g. quality control, error investigation)
  • Visual control and management systems (accounting, barcode reading)
  • Control of automated vehicles
  • Quality control and food inspection

In the automotive industry, machine vision systems are used to guide industrial robots and to inspect vehicle paint surfaces, welds, engine blocks, and many other components for defects.

UDC 004.93"1

Machine vision

Tatyana Vadimovna Petrova, group 4241/3

Machine vision is the application of computer vision to industry and manufacturing. The area of ​​interest of machine vision is digital input/output devices and computer networks for monitoring production equipment. Machine vision has some advantages over human vision. Accordingly, it is important to develop this area of ​​science. This review describes the history of the development of computer vision, the components of a computer vision system, the application of computer vision and the future of this field of science.


Introduction

computer machine vision production

A person receives the main part of information about the outside world through the visual channel and then very effectively processes the received information using the apparatus of analysis and interpretation of visual information. Therefore, the question arises about the possibility of machine implementation of this process.

Due to the increasing complexity of scientific and technical problems being solved, automatic processing and analysis of visual information are becoming increasingly pressing issues. These technologies are used in highly sought-after areas of science and technology, such as process automation, increasing productivity, improving the quality of manufactured products, control of production equipment, intelligent robotic systems, control systems for moving vehicles, biomedical research and many others. In addition, it can be said that the success of modern business is based mainly on the quality of the products offered. And to ensure this, if we talk about the production of material things, visual control is required.

Further we will use the term “machine vision” as a concept that most fully encompasses the range of engineering technologies, methods and algorithms associated with the task of interpreting visual information, as well as the practical use of the results of this interpretation.


1. History of the development of machine vision

Computer vision emerged as an independent discipline by the end of the 60s. This direction arose within the framework of artificial intelligence at a time when there were still heated debates about the possibility of creating a thinking machine. It emerged from work on pattern recognition. [Zueva, 2008]

A brief history of the development of machine vision is presented in Figure 1.

Rice. 1. History of machine vision

In the history of the development of machine vision, the following stages can be distinguished:

· 1955 - Massachusetts Institute of Technology (MIT) professor Oliver Selfridge published the article “Eyes and Ears for the Computer.” In it, the author put forward the theoretical idea of ​​equipping a computer with sound and image recognition tools.

· 1958 - psychologist Frank Rosenblatt from Cornell University created a computer implementation of the perceptron (from perception - perception) - a device that simulates the pattern recognition circuit of the human brain. The perceptron was first modeled in 1958, and its training required about half an hour of computer time on an IBM-704 computer. The hardware version - Mark I Perceptron - was built in 1960 and was intended for visual image recognition [Computer Vision, 2010] .

However, the consideration of computer vision problems was rather speculative, since neither the technology nor the mathematical support for solving such complex problems was yet available.

· 1960s - the emergence of the first image processing software systems (mainly to remove noise from photographs taken from aircraft and satellites), applied research in the field of printed character recognition began to develop. However, there were still limitations in the development of this field of science, such as the lack of cheap optical data input systems, the limitations and rather narrow specialization of computing systems. The rapid development of computer vision systems throughout the 60s can be explained by the expanding use of computers and the obvious need for faster and more efficient human-computer communication. By the early 60s, computer vision problems mainly covered the area of ​​space research, which required processing a large amount of digital information.

· 1970s - Lawrence Roberts, a graduate student at MIT, put forward the concept of machine construction of three-dimensional images of objects based on the analysis of their two-dimensional images. At this stage, a more in-depth analysis of the data began. Various approaches to recognizing objects in an image have begun to develop, such as structural, feature, and texture.

· 1979 - Professor Hans-Helmut Nagel from the University of Hamburg laid the foundations of the theory of dynamic scene analysis, which makes it possible to recognize moving objects in a video stream.

· In the late 1980s, robots were created that were capable of more or less satisfactorily assessing the world around them and independently performing actions in the natural environment

· The 80s and 90s were marked by the emergence of a new generation of sensors for two-dimensional digital information fields of various physical natures. The development of new measuring systems and methods for recording two-dimensional digital information fields in real time has made it possible to obtain time-stable images generated by these sensors for analysis. Improving the production technologies of these sensors has made it possible to significantly reduce their cost, and therefore significantly expand the scope of their application.

· Since the beginning of the 90s, in the algorithmic aspect, the sequence of actions for image processing has been considered in accordance with the so-called modular paradigm. This paradigm, proposed by D. Marr on the basis of a long-term study of the mechanisms of human visual perception, argues that image processing should be based on several successive levels of an ascending information line: from the “iconic” representation of objects (raster image, unstructured information) to their symbolic representation ( vector and attribute data in structured form, relational structures, etc.). [Wiesilter et al., 2007]

· In the mid-90s, the first commercial automatic vehicle navigation systems appeared. Effective means of computer analysis of movements were developed at the end of the 20th century.

· 2003 - the first fairly reliable corporate facial recognition systems were released onto the market.


2. Problems of computer vision and areas of its application

2.1 Definition of “machine vision”

Machine vision is the application of computer vision to industry and manufacturing. The area of ​​interest of computer vision as an engineering field is digital input/output devices and computer networks designed to control production equipment, such as robotic arms or devices for retrieving defective products.

Machine vision is the study of methods and techniques whereby artificial vision systems can be constructed and usefully employed in practical applications. As such, it embraces both the science and engineering of vision .

Its study includes not only the software but also the hardware environment and image acquisition techniques needed to apply it. As such, it differs from computer vision, which appears from most books on the subject to be the realm of the possible design of the software, without too much attention on what goes into an integrated vision system (though modern books on computer vision usually say a fair amount about the "nasty realities" of vision, such as noise elimination and occlusion analysis).

2.2 Machine vision today.

Currently, there is a clear boundary between the so-called monocular and binocular computer vision. The first area includes research and development in the field of computer vision related to information coming from a single camera or from each camera separately. The second area includes research and development that deals with information simultaneously received from two or more cameras. Multiple cameras in such systems are used to measure the depth of observation. These systems are called stereo systems.

To date, the theory of computer vision has fully developed as an independent branch of cybernetics, based on a scientific and practical knowledge base. Every year hundreds of books and monographs are published on this topic, dozens of conferences and symposia are held, and various software and hardware are produced. There are a number of scientific and public organizations that support and cover research in the field of modern technologies, including computer vision technologies.

2.3. Main tasks of computer vision

In general, the tasks of computer vision systems include obtaining a digital image, processing the image in order to highlight significant information in the image, and mathematical analysis of the obtained data to solve the assigned problems.

However, computer vision allows you to solve many problems, which can be divided into four groups (Fig. 2) [Lysenko, 2007] :


Fig.2. Computer vision tasks


· Position recognition

The purpose of computer vision in this application is to determine the spatial location (the location of an object relative to an external coordinate system) or the static position of an object (what position the object is in relative to a coordinate system with a reference point within the object itself) and transmit information about the position and orientation of the object to the control system or controller.
An example of such an application would be a material handling robot that is tasked with moving objects of various shapes out of a silo. The intelligent task of machine vision is, for example, to determine the optimal reference coordinate system and its center for localizing the center of gravity of a part. The information obtained allows the robot to properly grasp the part and move it to the proper location.

Facial recognition technology based on facial biometrics is the “pinnacle” of video analytics: it poses the most complex problems and uses a wide range of mathematical tools. On the one hand, the biometric system implements the recognition function by establishing a probabilistic connection between the image and the identifiers of people registered in the database. On the other hand, a biometric system requires flawless detection and tracking functions.

Examples of successfully solved problems using video analytics functions:

  • Recognition for the purpose of counting people and vehicles
  • Number plate recognition (in transport, on banknotes, documents, etc.)
  • Event detection (movements, movements, crossing permissible lines and boundaries, being in zones, throwing objects over a fence, etc.)
  • Detection of dangerous situations (crowds of people, abandoned objects, fires and smoke, etc.)
  • Recognizing human faces and searching them in databases

Application of video analytics

The use of video analytics makes it possible to automatically, without human intervention, solve problems during video surveillance that are usually only possible with human vision. This technology is used both to ensure security and to improve business efficiency in trade, the financial sector and transport.

Functions and applications

  • Object recognition - Security, object counting in trade and transport
  • Event detection - Security, personnel monitoring
  • Analysis of object activity - Improving the quality of service

Commercial use of video analytics

Video analytics is often used to obtain an objective assessment of business performance, as it is capable of continuous and automated data collection, independent of the human factor, and generating reports upon user request at any time. Video analytics technology is used in retail trade, banks, shopping centers, and also by manufacturers of CPG goods. Video analytics technologies are widely used to solve complex problems of ensuring security and providing statistical and marketing data. Video analytics analyzes the following parameters:

  • Flow of people and transport
  • The number of objects in the queue and the delay time of people in the queue
  • People activity in the selected area

Counting people and vehicles

To ensure industrial safety, not only classic video surveillance systems are successfully used, but also technologies such as facial recognition and license plate recognition systems. The first facial recognition system was installed in 1998 in the London borough of Newham. In the 2000s, facial recognition systems made it possible to identify a person's face with an accuracy of at least 80%. Today this figure exceeds 95%. Thus, machines have learned to recognize images better than humans!

License plate recognition systems can be used at the checkpoint of a manufacturing enterprise. Cameras installed at the entrance to the parking lot not only recognize vehicle license plates, but also analyze, archive and transmit vehicle data to the dispatcher's console, and also report information about the situation in the controlled area.

Until recently, video analytics algorithms were used mainly for detecting events, counting visitors, recognizing dangerous objects and identifying persons to ensure security at various sites. Modern developments in the field of video analytics are capable of solving a wide range of commercial problems. Algorithms can collect and analyze important marketing information in real time (counting people and vehicles, monitoring people’s activity in certain areas, etc.). As analysis technologies develop, the information coming from video surveillance systems becomes more and more valuable and begins to be actively used by business.

Functions of a video analytics system in counting

  • Real-time counting of people and vehicles
  • Collection and analysis of quantitative data collected as a result of counting algorithms

People counting for business purposes is used to calculate several important business performance indicators:

  • CPM (Cost Per Mile or Cost Per Thousand - sales volume per thousand visitors)
  • SSF (Sales Per Square Foot or Sales Per Unit Area - number of sales per unit area)

Business Opportunities

  • Sales forecasting based on data on the actual flow of visitors/customers
  • Assessing business performance, calculating the conversion rate en:Conversion rate based on statistical data on site traffic
  • Linking the employee motivation system to the conversion rate en:Conversion rate
  • Analysis of the quality of capacity utilization: retail space, personnel work
  • Assessing the effectiveness of advertising campaigns and investments in PR and marketing based on site traffic data
  • Reducing personnel costs, adjusting the number of personnel per shift and the facility’s work schedule in accordance with the intensity of visitor flow

Automatic video analysis of a limited area

Functions of a video analytics system in perimeter analysis

  • Counting the number of objects in a limited perimeter
  • Identification of objects located in the perimeter by certain characteristics (identification of personnel by uniform, etc.)
  • Calculation of delay time of objects in a given perimeter
  • Monitoring the activity of objects in a given perimeter (detection of movement, facts of absence in the perimeter, etc.)

Business Opportunities

  • Calculation of the optimal number of service personnel based on data on visitor behavior
  • Recording the activity of personnel for subsequent search in the video archive when analyzing conflict situations
  • Evaluating the effectiveness of advertising campaigns and adjusting them
  • Providing vendors with information on the effectiveness of promotions
  • Prevention of theft of funds and goods (control of cash register areas, warehouse premises, goods acceptance areas, etc.)
  • Analysis of the activity of store visitors/customers in selected zones
  • Calculation of conversion rates for selected departments.

Scientific research in the field of video analytics

Video data analysis is a subset of computer vision and artificial intelligence. Significant research in these areas is being conducted at the University of Calgary, University of Waterloo, Kingston University, Georgia Institute of Technology, Carnegie Mellon University, West Virginia University and British Columbia Institute of Technology.

Video surveillance systems

  • Video surveillance systems Catalog of products and projects
  • Video surveillance systems: Design features of cameras
  • 10 ways to increase your income with covert video surveillance
  • IP video surveillance for retail organizations for security and theft prevention

Development of video analytics in Russia

Scientific research in the field of computer vision and artificial intelligence has been conducted in Russia since the 2000s at research centers and several large universities.

In Russia, until recently, video analytics algorithms were used mainly for detecting events, counting visitors, recognizing dangerous objects and identifying persons in order to ensure security at various sites: protected areas, transport (airports, railway transport, license plate recognition for the traffic police), as well as at government facilities.

Modern developments in the field of video analytics are capable of solving a wide range of commercial problems. Algorithms can collect and analyze important marketing information in real time (counting people and vehicles, analyzing queues, monitoring people’s activity in certain areas). The high accuracy and reliability of the data obtained as a result of the operation of video analytics systems is confirmed by the widespread use of algorithms in business.

Vitaly Kuznetsov, Managing Partner of Office Anatomy: In the computer vision or video analytics market there is too much marketing from developers, little experience from integrators and high expectations from the new product on the part of the customer. As a result, the developer praises the newly created, but still far from ideal, product, the integrator sells it to the client, and the customer, warmed up by pictures from Hollywood blockbusters with spectacular scenes of facial recognition and detection of criminals, is left with disappointment. Everyone is too interested in getting a revolutionary product that will irrevocably change the security systems market.

What is computer vision?

Computer vision is a technology that allows machines to find, track, classify and identify objects by extracting data from images and analyzing the resulting information.

Computer vision is used for object recognition, video analytics, image and video content description, gesture and handwriting recognition, and intelligent image processing.

Global computer vision market

How is machine vision different from computer vision?

Machine vision uses image analysis to solve industrial problems. Machine and computer vision - related areas

Beginners may think that these are different names for the same technology, but this is not the case, since computer vision is a general name for a set of technologies, and machine vision is a field of application.

Computer vision tasks

Machine vision allows you to abandon manual labor, because a robot can control the assembly of products, count and measure objects, read text, numbers and identify objects.

Machine vision is used in various fields. In medicine - in order to make a more accurate diagnosis, in industry - to reduce the cost of goods through automation. In the automotive industry - for navigation of drones, and in retail - for reading barcodes or counting visitors.

Machine vision systems

Since machine vision is used to solve various industrial problems, depending on what specific problem needs to be solved, special machine vision systems are created.

Typical machine vision systems consist of cameras, software, processors, light sources, software applications, and various sensors.

For example, a sensor determined that a part on a conveyor needed to be checked, launched a camera and took a picture of this part. The image is then sent to a computer, where computer vision software processes the resulting image.

After the image is processed, depending on the condition of the part, the program allows or does not allow the part to pass further along the conveyor. That is, if a part is damaged, the software will send a signal to the device to reject it, stop production, or warn a person that there is a defective part.

  • Video surveillance systems - catalog of systems and projects
  • IS - Biometric identification - catalog of systems and projects
Did you like the article? Share with your friends!