A 128$\times$128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision SensorP. Lichtsteiner, C. Posch, Tobi Delbrück|IEEE Journal of Solid-State Circuits|2008 This paper describes a 128 times 128 pixel CMOS vision sensor. Each pixel independently and in continuous time quantizes local relative intensity changes to generate spike events. These events appear at the output of the sensor as an asynchronous stream of digital pixel addresses. These address-events signify scene reflectance change and have sub-millisecond timing precision. The output data rate depends on the dynamic content of the scene and is typically orders of magnitude lower than those of conventional frame-based imagers. By combining an active continuous-time front-end logarithmic photoreceptor with a self-timed switched-capacitor differencing circuit, the sensor achieves an array mismatch of 2.1% in relative intensity event threshold and a pixel bandwidth of 3 kHz under 1 klux scene illumination. Dynamic range is > 120 dB and chip power consumption is 23 mW. Event latency shows weak light dependency with a minimum of 15 mus at > 1 klux pixel illumination. The sensor is built in a 0.35 mum 4M2P process. It has 40times40 mum <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> pixels with 9.4% fill factor. By providing high pixel bandwidth, wide dynamic range, and precisely timed sparse digital output, this silicon retina provides an attractive combination of characteristics for low-latency dynamic vision under uncontrolled illumination with low post-processing requirements.
A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDSC. Posch, Daniel Matolin, Rainer Wohlgenannt|IEEE Journal of Solid-State Circuits|2010 The biomimetic CMOS dynamic vision and image sensor described in this paper is based on a QVGA (304×240) array of fully autonomous pixels containing event-based change detection and pulse-width-modulation (PWM) imaging circuitry. Exposure measurements are initiated and carried out locally by the individual pixel that has detected a change of brightness in its field-of-view. Pixels do not rely on external timing signals and independently and asynchronously request access to an (asynchronous arbitrated) output channel when they have new grayscale values to communicate. Pixels that are not stimulated visually do not produce output. The visual information acquired from the scene, temporal contrast and grayscale data, are communicated in the form of asynchronous address-events (AER), with the grayscale values being encoded in inter-event intervals. The pixel-autonomous and massively parallel operation ideally results in lossless video compression through complete temporal redundancy suppression at the pixel level. Compression factors depend on scene activity and peak at ~1000 for static scenes. Due to the time-based encoding of the illumination information, very high dynamic range - intra-scene DR of 143 dB static and 125 dB at 30 fps equivalent temporal resolution - is achieved. A novel time-domain correlated double sampling (TCDS) method yields array FPN of <;0.25% rms. SNR is >56 dB (9.3 bit) for >10 Lx illuminance.
Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With Spiking OutputState-of-the-art image sensors suffer from significant limitations imposed by their very principle of operation. These sensors acquire the visual information as a series of “snapshot” images, recorded at discrete points in time. Visual information gets time quantized at a predetermined frame rate which has no relation to the dynamics present in the scene. Furthermore, each recorded frame conveys the information from all pixels, regardless of whether this information, or a part of it, has changed since the last frame had been acquired. This acquisition method limits the temporal resolution, potentially missing important information, and leads to redundancy in the recorded image data, unnecessarily inflating data rate and volume. Biology is leading the way to a more efficient style of image acquisition. Biological vision systems are driven by events happening within the scene in view, and not, like image sensors, by artificially created timing and control signals. Translating the frameless paradigm of biological vision to artificial imaging systems implies that control over the acquisition of visual information is no longer being imposed externally to an array of pixels but the decision making is transferred to the single pixel that handles its own information individually. In this paper, recent developments in bioinspired, neuromorphic optical sensing and artificial vision are presented and discussed. It is suggested that bioinspired vision systems have the potential to outperform conventional, frame-based vision systems in many application fields and to establish new benchmarks in terms of redundancy suppression and data compression, dynamic range, temporal resolution, and power efficiency. Demanding vision tasks such as real-time 3-D mapping, complex multiobject tracking, or fast visual feedback loops for sensory-motor action, tasks that often pose severe, sometimes insurmountable, challenges to conventional artificial vision systems, are in reach using bioinspired vision sensing and processing techniques.
HFirst: A Temporal Approach to Object RecognitionGarrick Orchard, Cédric Meyer, Ralph Etienne‐Cummings et al.|IEEE Transactions on Pattern Analysis and Machine Intelligence|2015 This paper introduces a spiking hierarchical model for object recognition which utilizes the precise timing information inherently present in the output of biologically inspired asynchronous address event representation (AER) vision sensors. The asynchronous nature of these systems frees computation and communication from the rigid predetermined timing enforced by system clocks in conventional systems. Freedom from rigid timing constraints opens the possibility of using true timing to our advantage in computation. We show not only how timing can be used in object recognition, but also how it can in fact simplify computation. Specifically, we rely on a simple temporal-winner-take-all rather than more computationally intensive synchronous operations typically used in biologically inspired neural networks for object recognition. This approach to visual computation represents a major paradigm shift from conventional clocked systems and can find application in other sensory modalities and computational tasks. We showcase effectiveness of the approach by achieving the highest reported accuracy to date (97.5% ± 3.5%) for a previously published four class card pip recognition task and an accuracy of 84.9% ± 1.9% for a new more difficult 36 class character recognition task.
A 128 X 128 120db 30mw asynchronous vision sensor that responds to relative intensity changeA vision sensor responds to temporal contrast with asynchronous output. Each pixel independently and continuously quantizes changes in log intensity. The 128X128-pixel chip has 120dB illumination operating range and consumes 30mW. Pixels respond in <100mus at 1klux scene illumination with <10% contrast-threshold FPN