Introduction 빛 그리고 비전 시스템 Hyung Il Choi

Introduction 빛 그리고 비전 시스템 Hyung Il Choi
This course is a basic introduction to parts of the field of computer vision. This version of the course covers topics in 'early' or 'low' level vision and parts of 'intermediate' level vision. It assumes no background in computer vision, a minimal background in Artificial Intelligence and only basic concepts in calculus, linear algebra, and probability theory. 빛 그리고 비전 시스템 Hyung Il Choi

Acknowledgements The slides in this lecture were adopted and modified from lectures by Professor Allen Hanson University of Massachusetts at Amherst

다양한 영상 가시광선 반사 영상 (흑백 또는 칼라) 동영상 (캠코더) 스테레오 영상 (2 카메라)
필름 비디오 카메라 (테이프) 디지털 카메라 (메모리) 동영상 (캠코더) 스테레오 영상 (2 카메라) 거리(깊이) 영상 (Range finder) 비가시광선 영상 적외선 (IR) 자외선 (UV) 마이크로웨이브 기타 There are many different kinds of sensors available which produce two-dimensional distributions of data related to a scene. Traditional sensors, such as film or video cameras, are obvious choices for rendering a scene in a manner consistent with the way we view it (that is, using visible light). The output of either of these can be converted to digital form (this topic is covered in the next section). These sensors can also be made sensitive to radiation outside the range of normal human vision, such as the near infra-red and ultra-violet. By taking multiple images as the sensor moves, motion sequences or stereo image pairs can be easily obtained. However, there are many other types of sensors which can be used in vision. Range sensors directly measure the depth to a point (actually, they measure average depth over a small area). By scanning a scene, images are produced which represent depth distributions. These images contain information which may not be directly available in more conventional images. Finally, there is no reason to limit our view of the world to only visible light when we can build sensors that are sensitive to ultra-violet light, infra-red light, the radio spectrum, x-rays, electron beams, etc. In some vision systems, multiple sensors might be used to simultaneously record information from many sensory 'bands'; these images may be spatially registered. Of course, the use of multiple sensors produces even more data to be processed.

전자기파 C = f l E  f Visible Spectrum 700 nm 400 nm
An electromagnetic wave, although it carries no mass, does carry energy. It also has momentum, and can exert pressure (known as radiation pressure). The reason tails of comets point away from the Sun is the radiation pressure exerted on the tail by the light (and other forms of radiation) from the Sun. The energy carried by an electromagnetic wave is proportional to the frequency of the wave. The wavelength and frequency of the wave are connected via the speed of light: C=fl Electromagnetic waves are split into different categories based on their frequency (or, equivalently, on their wavelength). In other words, we split up the electromagnetic spectrum based on frequency. Visible light, for example, ranges from violet to red. Violet light has a wavelength of 400 nm, and a frequency of 7.5 x 1014 Hz. Red light has a wavelength of 700 nm, and a frequency of 4.3 x 1014 Hz. Any electromagnetic wave with a frequency (or wavelength) between those extremes can be seen by humans. Visible light makes up a very small part of the full electromagnetic spectrum. Electromagnetic waves that are of higher energy than visible light (higher frequency, shorter wavelength) include ultraviolet light, X-rays, and gamma rays. Lower energy waves (lower frequency, longer wavelength) include infrared light, microwaves, and radio and television waves. Shortest Wavelengths \ Highest Frequency: Gamma rays are radiation from nuclear decay, when a nucleus changes from an excited energy state to a lower energy state. Gamma rays are typically waves of frequencies greater than 1019 Hz. They have high energies (greater than 104 eV per photon) and extremely short wavelengths (less than m). Gamma rays can penetrate nearly all materials and are therefore difficult to detect. Gamma rays have mostly been detected in the activities in space such as the Crab Nebula and the Vela Pulsar. The highest frequency of gamma rays that have been detected is 1030Hz measured from diffuse gamma ray emissions. Longest Wavelength \ Lowest Frequency: Several textbooks cite the frequency of the lowest electromagnetic waves on the order of 102 Hertz (Hz) and such waves are classified as Extremely Low Frequency (ELF). However, it has been discovered that the frequency of the lowest electromagnetic waves is on the order of the 10-3 Hz (millihertz or mHz) and are known as micropulsations. Micropulsations or geomagnetic pulsations are responses to changes in the magnetosphere. The magnetosphere is a cavity in the solar wind, which is the result of the geomagnetic field (earth's magnetic field) impeding the direct entry of the ionized gas (plasma) of the solar wind into the cavity. Micropulsations were first observed and published by Balfour Steward in He described pulsations with frequencies ranging from 3 mHz to 30 mHz. Today, geomagnetic pulsations cover the frequency range from 1 mHz to 1 Hz. Pulsations are divided into two classes, continuous and irregular, each of which are further divided according to the period of the pulsations. Crab Nebula- nebula=galaxy, cloudlike masses of gases or dust among the stars Vela Pulsar micropulsations

The Human Eye Cornea – the transparent part of the coat of the eyeball covering the iris and the pupil Iris – the colored part around the pupil of the eye Lens – the pupil – the dark central opening of the iris of the eye Retina – the sensory membrane

The Eye 망막: rods (low-level light, night vision) cones (color-vision)
Retina Rods Cones Synapses- the point at which a nervous impulse passes from one neuron to another 망막: rods (low-level light, night vision) cones (color-vision) synapses optic nerve fibers

Film, Video, Digital Cameras
흑백 (가시광선 반사량) 칼라 (3 채널에서 가시광선 반사량 - red, green, blue)

Across the EM Spectrum Crab Nebula
Astronomy Picture of the Day: Today's Picture: June 24, 1995 The Crab Nebula and Geminga in Gamma Rays Picture Credit: NASA, Compton Gamma Ray Observatory Explanation: What if you could "see" in gamma-rays? If you could, these two spinning neutron stars or pulsars would be among the brightest objects in the sky. This computer processed image shows the Crab Nebula pulsar (below and right of center) and the Geminga pulsar (above and left of center) in the "light" of gamma-rays. Gamma-ray photons are more than 10,000 times more energetic than visible light photons and are blocked from the Earths's surface by the atmosphere. This image was produced by the high energy gamma-ray telescope "EGRET" on board NASA's orbiting Compton Observatory satellite. The Astronomy Pictures of the Day are available on the World Wide Web. Each day the site features a different picture of some part of our fascinating universe, along with a brief explanation written by a professional astronomer. The Astronomy Picture of the Day is brought to you by Robert Nemiroff and Jerry Bonnell . Original material on this page is copyrighted to Robert J. Nemiroff and Jerry T. Bonnell

Across the EM Spectrum Medical X-Rays

Across the EM Spectrum 자외선 (Ultraviolet) 촬영한 꽃무늬 패턴 Dandelion - UV
Potentilla The UV range starts below 400 nm, i.e. deep violet. Usually UV radiation is divided into three main ranges, these are UV-A ( nm), UV-B ( nm) and UV-C (below 280nm). We humans are effectively "blind" to this radiation and it is felt mainly indirectly through sun-burns, irritated eyes and an elevated risk of skin cancer. The shorter wavelengths are the most dangerous in this respect. Some Perianth parts have pigmentation patterns which guide pollinators directly to reproductive structures. These are not always apparent with visible light. Some insects can perceive ultraviolet patterns that are invisible to humans.

Across the EM Spectrum Messier 101 in Ultraviolet
M101: An Ultraviolet View Credit: Astro 2, UIT, NASA Explanation: This picture of giant spiral galaxy Messier 101 (M101) was taken by the Ultraviolet Imaging Telescope (UIT). UIT flew into orbit as part of the Astro 2 mission on-board the Space Shuttle Endeavour in March The image has been processed so that the colors (dark purple through white) represent an increasing intensity of ultraviolet light. Pictures of galaxies like this one show mainly clouds of gas containing newly formed stars many times more massive than the sun, which glow strongly in the ultraviolet. In contrast, visible light pictures of galaxies tend to be dominated by the yellow and red light of older stars. Ultraviolet light, invisible to the human eye, is blocked by ozone in the atmosphere so ultraviolet pictures of celestial objects must be taken from space. M101 is a mere 22 million light-years away in the constellation Ursa Major. Its popular moniker is the Pinwheel Galaxy.

Across the EM Spectrum 일반적인 가시광선 영상

Across the EM Spectrum 레이저 (Laser) 스캐닝을 이용한 Rangefinder

Across the EM Spectrum 적외선 (IR): Near, Medium, Far (~heat)
Brightness shows magnitude Using to Sony MicroMV Cam to show a real imaging demo with the projector

Across the EM Spectrum 적외선 (IR): Near, Medium, Far (~heat)
Pseudo-colored IR images - Warm: red, cool: blue

Across the EM Spectrum 마이크로웨이브 영상: Synthetic Aperture Radar (SAR)
Spaceborne Imaging Radar-C and X-Band Synthetic Aperture Radar (SIR-C/X-SAR) is part of NASA's Mission to Planet Earth. The radars illuminate Earth with microwaves allowing detailed observations at any time, regardless of weather or sunlight conditions. SIR-C/X-SAR uses three microwave wavelengths: L-band (24 cm), C-band (6 cm) and X-band (3 cm). The multi-frequency data will be used by the international scientific community to better understand the global environment and how it is changing. The SIR-C/X-SAR data, complemented by aircraft and ground studies, will give scientists clearer insights into those environmental changes which are caused by nature and those changes which are induced by human activity. SIR-C was developed by NASA's Jet Propulsion Laboratory. X-SAR was developed by the Dornier and Alenia Spazio companies for the German space agency, Deutsche Agentur fuer Raumfahrtangelegenheiten (DARA), and the Italian space agency, Agenzia Spaziale Italiana (ASI), with the Deutsche Forschungsanstalt fuer Luft und Raumfahrt e.v.(DLR), the major partner in science, operations, and data processing of X-SAR. The colors assigned to the radar frequencies and polarizations are as follows: red is L- band, horizontally transmitted, vertically received; green is C-band, horizontally transmitted, vertically received; and blue is the ratio of C-band to L-band, horizontally transmitted and received. Commonly used Radar Bands: Ka Band: 40,000-26,000 MHz ( cm wavelength) K Band: 26,500-18,500 MHz ( cm) X Band: 12,500-8,000 MHz ( cm) C Band: 8,000-4,000 MHz ( cm) L Band: 2,000-1,000 MHz ( cm) P Band: 1, Mhz ( cm) San Fernando Valley Tibet: Lhasa River Red: L-band (24cm) Green: C-band (6 cm) Blue:C/L Athens, Greece Thailand: Phang Hoei Range

Across the EM Spectrum Low Altitude Interferometric Synthetic Aperture Radar (IFSAR) The image on the left is a section of Albuquerque, New Mexico, shown the way an interferometric SAR (IFSAR) 'sees' an urban region. When compared to the image on the right, which was derived from aerial photography, an obvious reaction is that the photo derived product is superior. But what makes this IFSAR result a significant breakthrough is that it can be produced automatically from the source data in a matter of minutes. It requires several days to manually produce the height map from aerial photography. An IFSAR is a radar system that utilizes two standard antennas oriented to form a baseline orthogonal to the platform flight direction. Using this configuration the radar data can be processed so that each element in the resultant image contains information on the local topography. The IFSAR data shown above was produced by Sandia National Laboratories using their airborne Ku-band IFSAR system. It is approximately 1 meter horizontal resolution with better than 0.5 meter vertical accuracy. IFSAR elevation, automatic, in minutes Elevation from aerial stereo, manually, several days

Across the EM Spectrum Radio Waves (images of cosmos from radio telescopes) Distant Galaxies in Radio Vision Credit: M. Garrett (JIVE), T. Muxlow and S. Garrington (Jodrell Bank), EVN Explanation: Radio waves, like visible light, are electromagnetic radiation and radio telescopes can "see" -- their signals translate into radio images of the cosmos. While individually even the largest radio telescopes have very blurry vision compared to their optical counterparts, networks of radio telescopes can combine signals to produce sharper pictures. In fact, using an NRAO supercomputer in New Mexico, USA and technique called VLBI (Very Long Baseline Interferometry), the European network of radio telescopes (EVN) has produced pictures of distant galaxies at a resolution some three times higher than the Hubble Space Telescope. Penetrating obscuring dust, the false-color EVN radio images are inset above according to their relative location in an optical image of the famous Hubble Deep Field region of the sky. (Yellow lines superimposed on the optical image are radio intensity contours from a single telescope.) The bright cosmic radio source in the middle of each inset corresponds to a galaxy. Impressively, the radio sources appear to be so small, less than about 600 light-years across in actual size, that they are thought to be associated with massive central black holes in the distant deep field galaxies.

Stereo Geometry Single Camera (no stereo)

Stereo Geometry LEFT CAMERA RIGHT CAMERA P(X,Y,Z) pl(x,y)
Optical Center f = focal length Film plane f = focal length Optical Center pr(x,y) Film plane B = Baseline

Stereo Geometry Disparity = xr - xl ≈ depth LEFT IMAGE RIGHT IMAGE P
Pl(xl,yl) Pr(xr,yr) Disparity = xr - xl ≈ depth

Stereo Images 작은 이탈 (Short Digression) Stereoscopes

Darjeeling Suspension Bridge
Stereo Images Darjeeling Suspension Bridge Red/Blue stereo images

3D glasses 적청 안경 돌비 3D 안경 편광 안경 셔터 안경

Picture of you?

Range Sensors 빛 줄무늬 (Light Striping) 패턴을 이용하여
for breast cancer study David B. Cox, Robyn Owens and Peter Hartmann Department of Biochemistry University of Western Australia

Mosaics 여러 영상들을 모은 모자익 영상 (mosaic)

Mosaics Stabilized Video

Mosaics Brazilian forest…..made at UMass CVL

Why is Vision Difficult?
같은 물체라도 다양한 변화 Color, texture, size, shape, parts, and relations 영상 생성과정에서의 다양한 변화 Lighting (highlights, shadows, brightness, contrast) Projective distortion, point of view, occlusion Noise, sensor and optical characteristics 엄청난 데이터량 1 minute of 1024x768 color video = 4.2 gigabytes (Uncompressed) One of the main difficulties in vision is that the system must contain models of what it is 'capable' of seeing and must ultimately select the correct set of models from the information contained in the image. The modeling aspect of computer vision is difficult due to the enormous variations in the geometric and functional descriptions of what we normally think of as 'objects'. For example, consider two fairly simple concepts: house and chair. Houses come in widely varying colors, sizes, shapes. There are many different styles of houses, such as modern, victorian, ranch, and saltbox. All look visually distinctive, even within a style. Somehow, a system must capture what is true in general of the object class 'house' as well as representing specific instances of the class (such as 'my house'). Objects may also be defined by function as opposed to geometry. For example, at a very abstract level, what is the purpose of a chair? Clearly, it is something to sit in or on, but almost anything that serves this function can be used as a chair and can be defined as a chair in some sense, particularly if all we are looking for is some place to sit. To make matters even more difficult, the embedding of an object in a scene and the imaging process itself may introduce many different kinds of noise and distortions. Objects may be partially occluded by other objects, the scene may have particularly high contrast or be very bright or very dark, the sensor may be particularly noisy, lenses may be unsharp, and so forth. Finally, vision entails processing massive amounts of data, often repeatedly. One minute of high resolution video, similar to that obtained from HDTV, is about 4.2 gigabytes uncompressed. Compression may reduce the amount of data substantially but it introduces an additional set of problems that we will discuss later.

The Need for Knowledge Variation Knowledge Motion Context Function
Shape To the extent that knowledge is required in vision to achieve the goals set out earlier, in which a globally consistent interpretation (description) must be inferred from partial or incomplete information, then vision has components which have aspects similar to current problems in artificial intelligence. These include representation of general knowledge about objects in scenes, the potential ambiguities in interpretation arising from the possibly incomplete and erroneous data extracted from the image, variations in structure, size, and/or shape of specific instances of general object classes, and the requirement of globally consistent interpretations. That humans make strong assumptions about what is being viewed in the image in order to arrive at an interpretation is demonstrated by the figure embedded in the slide. It should be immediately recognizable.... Shape Purpose Specific Objects Generic Objects Structure Size

The Figure Revealed .....as a side-lit and shadowed three-dimensional cube. There is an assumption that the two-dimensional object being viewed results from some projection of a three-dimensional object. Under this assumption, the interpretation becomes immediate and obvious, even though the original object was a piece of black construction paper laid on a white page!

The Effect of Context This is a reproduction of a famous example in the pattern recognition literature which demonstrates the effect of context on the interpretation of an object, in this case a letter. It is important to realize that in this example, the shape of the letter H in THE is identical to the shape of the A in CAT. It is only the surrounding context that causes the shift in its interpretation. How important the effect of context is on visual interpretation in humans in general is an open question in psychological literature. However, there are numerous examples similar to this in which it is possible to demonstrate a shift in interpretation due to changes in the surrounding context. On the other hand, in psychological studies on human subjects, Biederman has demonstrated that humans are remarkably good at recognizing objects taken out of context or found in unusual or unlikely contexts. However, Biederman did not directly address the issue of shifts in interpretation when ambiguous objects were placed in different but likely contexts.

The Effect of Context - 2

Context, cont. ….a collection of objects:
The objects here are more or less recognizable as hats.....

Context The objects as hats:
Here the objects are seen in a context in which they are clearly hats.

Context And as something else…..
‘To interpret something is to give it meaning in context.’ ....while here they are interpreted completely differently because of the surrounding context. Consequently, we can raise the issue as to whether an interpretation is valid only in specific contexts.

Vision System Components
저수준 (low level) 처리 단계 영상 생성 및 표현 (배열 형태) 특징 추출 및 표현 도메인 지식이 사용되지 안음 Given the preceding discussion, we are now in a position to hypothesize the various components required in a vision system. While the knowledge base provides general world models and constraints, it is the image or the image sequence that provides the window into a specific world, and it is from the image that the descriptive properties of the image events corresponding to world events must be derived. The low level system is responsible for making the initial measurements on the image data, creating new representations of the image data (either as derived images or as intermediate level tokens), and deriving descriptions of the tokens (e.g. this area of the image is predominantly green and lightly textured). Since we do not usually know very much about what is in the image at this point, these initial processes cannot make use of world knowledge or expectations during processing. From our earlier discussions, one would expect that the abstractions of the image data created from this level would include fairly primitive descriptions in terms of image edges, image areas of more or less constant brightness, color, texture, or combinations of features like these, local surface information such as curvature, location of corners, and the like. Each element of these abstractions in turn has a set of features specific to it. Areas can be described by their boundary, the average brightness of the area, average color, specific texture descriptions, and other features. Edges can be described by location and orientation features, contrast across the edge, etc. IMAGE (numbers) DESCRIPTION (symbols)

중간 수준 (intermediate level) 처리 단계 추출된 특징을 기호 형태로 표현 특징들에 대한 군집화(grouping) 도메인 지식 활용 방안 설계 중간 수준 처리 단계는 상향식 제어 (bottom-up, data-directed) 또는 햐향식 제어 (top-down, knowledge-directed)의 중간 단계 The intermediate level initially contains symbolic representations of primitive image descriptions. Processes at the intermediate level are responsible for creating additional descriptions at this level and for creating initial object hypotheses at the high level. As we have already pointed out, there must be a tight coupling between the derived image descriptions and real world knowledge through the common descriptive vocabulary used for each. It must be possible to match image properties to world properties in a way that relates the image to world models. Much of the work in vision is concerned with the derivation of image descriptions at a level far removed from the initial primitive descriptions available from the low level processes and with identification of generally applicable constraints that can be used to group the primitives into more abstract stable descriptions that better match the primitives used in the knowledge representation. The processes at the intermediate level which produce other intermediate level tokens are generally viewed as grouping processes, which construct new descriptions from existing tokens using relational criteria similar to those proposed as operational in human grouping behavior. For example, tokens (or objects) which satisfy nearness and similarity constraints are often seen as a group. In many cases, the group itself has emergent properties which are not evident in the individual components. In the curve example given earlier,the curve has measurable properties, such as curvature, which none of the edge elements making up the curve possess. The grouping processes are often combinatorially explosive, and consequently are rarely applied blindly across the entire image. This implies that the interpretation control mechanisms must judiciously select when and where to apply them. IINTERMEDIATE DESCRIPTIONS IMAGE KNOWLEDGE

고수준 처리(high level, interpretation) 단계 도메인 지식 표현 방법 Objects Object parts Expected scenarios (relations) Specializations 추론 (Inferencing) 방법 설계 Beliefs Partial matches 제어 방법 설계 Requirements at the high level have already been discussed somewhat. The representation of generic world knowledge must be structured to facilitate the matching processes which underlie interpretation. What these representations might be, and how they might be used during matching, is the focus of the second part of this course. One of the reasons for abstracting the descriptions into 'units' of larger size is to reduce the size of the search space when the 'units' are matched to the knowledge base. A system that produces multiple internally consistent interpretations can be thought of as being uncertain as to what it sees, because there is ambiguity as to which interpretation is the 'correct' one. A major factor contributing to this ambiguity is the incomplete and possibly inaccurate descriptions obtained from the low and intermediate level systems. The ambiguity can be overcome to some extent by combining information from different independent sources (like getting a second medical opinion) and generating a consensus opinion. A major problem in vision has been to develop mechanisms which can combine several pieces of uncertain information into an overall belief in the resulting interpretation. Treating object hypotheses as evidence for a set of models and confidence values as belief in the evidence leads to the general idea of inferencing mechanisms operating over the knowledge base and the evolving interpretation.

Introduction 빛 그리고 비전 시스템 Hyung Il Choi

Similar presentations

Presentation on theme: "Introduction 빛 그리고 비전 시스템 Hyung Il Choi"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction 빛 그리고 비전 시스템 Hyung Il Choi

Similar presentations

Presentation on theme: "Introduction 빛 그리고 비전 시스템 Hyung Il Choi"— Presentation transcript:

Similar presentations

About project

Feedback