A memristor-CMOS hybrid architecture concept for on-line template matching

Alexander Serb*, Christos Papavassiliou†, Themistoklis Prodromakis*
*Electronics and Computer Science Department, University of Southampton, SO17 1BJ, UK.
†Electrical and Electronic Engineering Department, Imperial College, London, SW7 2AZ, UK.

Email: {A.Serb, T.Prodromakis}@soton.ac.uk, {c.papavas}@imperial.ac.uk

Abstract—The ability to identify (detect) and categorise (sort) neural spikes in real-time and under highly restrictive power/area budgets is a major enabling technology towards the development of intelligent implantable systems. In this work we propose a memristor-CMOS hybrid architecture concept that relies on a ‘template pixel’ (texel) circuit combining CMOS and memristive devices to perform on-line spike sorting through template matching. We show through simulation how the texel is capable of comparing an input voltage against a stored (in the memristors) value and converting the degree of matching between input and stored pattern into a current. We further illustrate the fundamental texel design space that includes tuning it to a different preferred input voltage and controlling the sharpness of the tuning. Finally, we estimate that even in an unoptimised technology and design a texel array capable of recognising three different 10-point patterns will consume a very promising maximum of 3.15 μW for a footprint of approx. 500 μm².

Index Terms—CMOS, memristors, spike sorting

I. INTRODUCTION

A key component of the global effort to understand the functioning of the human brain pertains to the development of brain-machine interfaces capable of recording neuronal activity in-vivo; itself a whole research area progressing under its own version of Moore’s law [1]. Typically, large-scale neural activity monitoring is achieved through implantable systems that consist of three broad blocks: a) an electrode array [2], b) an Analogue Front-End (AFE) block usually consisting of amplification, filtering and digitisation [3], c) a signal processing block (Back-End) that may either concentrate on performing single-neuron activity detection (spike detection [4] or sorting [5]) or include Local Field Potential (LFP) extraction [6] and d) telemetry for transmitting the extracted data to the external receiver (Fig. 1).

In this work we concentrate on the back-end block and specifically single-neuron activity detection. At the algorithmic level this is currently achieved through a multitude of approaches such as threshold detection [7], non-linear energy operator [8], template matching [9] and others; each offering a different solution on the implementation complexity vs. accuracy trade-off space [10]. What all these methods share in common, however, is the objective of compressing a high data-rate voltage-time series signal arriving from the AFE block into a low data-rate/high-information output signal encoding the timing of neuronal action potentials (spikes) only, while suppressing noise (fig. 1).

At the implementation level on-chip spike detectors and sorters have to: a) operate using minimal area and power budgets, b) achieve maximum data compression to ease the power budget of the telemetry block and c) avoid placing excessively area/power expensive signal preconditioning requirements on the AFE block. Whilst architectures utilising fully digital [11] and mainly analogue [5] techniques seeking to address these requirements have already been proposed, more recently we have demonstrated a memristor-based methodology for spike detection and rudimentary sorting [12]. Memristors are electronic components that change their resistive state when appropriate voltages are applied across their terminals and can be practically implemented using a variety of technologies [13]. This fundamental property allows memristors to act as thresholded integrators, an ability that the proposed memristor-based system exploits in order to perform threshold detection of action potentials in a single component.

In this work we build on [12] by presenting a memristor-CMOS hybrid circuit for performing non-rudimentary on-line
spike sorting via template matching. The CMOS component performs the template matching in the analogue domain whilst the memristors are used to store the template library in an area-efficient way (leveraging their back-end-of-line integrability). The rest of the paper is organised as follows: In section II the operating principles and architecture of the proposed circuit are shown. Section III shows simulation results detailing the limits of configurability afforded to the template matching system by the memristive components whilst section IV discusses some of the important practical operating considerations and concludes the paper.

II. CONCEPT AND OPERATIONAL PRINCIPLES

Template matching-based spike sorting systems rely on the basic principle that action potentials emitted by the same neuron will be recorded as stereotyped waveforms as shown in fig. 1(a). Repeatedly time-sampling these stereotyped waveforms allows the generation of templates that a discrete-time system such as our proposed one could then recognise and discriminate from other templates. We shall refer to each template sample as a ‘template pixel’ (‘texel’).

A. Overall system architecture

The cornerstone of the proposed architecture is a standardised, programmable texel circuit that assembles into arrays capable of scanning conditioned input signals for the presence of spikes on-line, as shown in Fig. 2. The system operates as follows: Each channel receives a preconditioned neural data signal \( f(t) \) from the AFE which then feeds into a comparator (U1). Once \( f(t) \) exceeds ‘spike detection’ threshold \( TH_1 \), a spike is considered to be occurring (a technique also used in [4]). That triggers the Finite State Machine (FSM) which draws a fixed number of incoming analogue input signal samples and distributes them to a bank of Sample & Hold (SH) circuits; effectively an analogue register. Every tap in the register then feeds an entire column of texels, but only texels that receive an input sufficiently close to their preferred, programmed input will respond by outputting a current on to the output line of the template they belong to. The different spike templates to be recognised are stored in the different texel rows of the array. The current entering the output line of each template is integrated on a capacitor \( C_1 \) for a suitable amount of time and if an adjustable ‘spike recognition’ threshold \( TH_2 \) is exceeded a spike belonging to the corresponding template is registered. This small-bandwidth/high-information flag signal can then be transmitted outside the body. At the end of the entire procedure \( C_1 \) is reset. The system operates in discrete time as concerted by a clock enforcing a suitable sampling rate (typ. \( 7 = 28 \, k\text{Hz} \) [10]).

B. Block description

1) Input comparator, FSM and SH: The comparator can be implemented using either a standard low power clocked latch design (Fig. 3(a)) or a continuous time operational amplifier. In the former case power will be saved, but in the latter the expected reduction in sampling jitter may improve performance accuracy. The FSM can be implemented either as a counter or as a linear one-hot register, in both cases feeding a multiplexer. The multiplexer, in turn, routes a succession of input signal samples to the SH bank. Once triggered the FSM cannot be re-triggered until the SH bank is full. Finally, the SH circuit is implementable using the switch capacitor circuit topology carrying out correlated double sampling in CMOS imagers [14] (Fig. 3(b)). In the present system, however, the input is single-ended. The optimisation of the circuitry supporting the texel array lies outside the scope of this paper.

2) The texel: The basic architecture of the texel is shown in Fig. 4 and consists of two stages. The first stage is an inverter where a memristor-based potential divider has been introduced between its transistors. By changing the resistive states of the memristors the switch point of the inverter can be shifted (see Fig. 4(b,c)). The second stage is another inverter, this one fed through a mirror supply. The output of the texel
circuit is not the voltage, but the current draw of the second stage 1, R1, R2 is evident, as is the ‘plateau’ region where the memristor divider R of stage 1 the texel input voltage causing result of this topology is that by shifting the switch point value are simultaneously maximally open, which will occur at some Vd voltage. The operation of the texel can be understood by examining two transfer characteristic (I-V curves). The simulations reveal a number of interesting trends: First, Vpk is tunable within a range of \( \approx 0.55 - 1.05 \) V, which is broadly similar to the range defined by the power supply minus the thresholds of the p- and n-MOS devices \([V_{th,p}, VDD - V_{th,n}]\) (approx. \([0.50, 0.94]\) V in this technology). Second, memristor resistive states within 0.1 – 5 M\( \Omega \) suffice for covering a large part of that range. Third, the full-width half maximum (FWHM) plot of Vpk indicates that to some extent it might be possible to cover most of the Vpk range at a controllable degree of sensitivity. The broader the FWHM the blunter the ‘tuning curve’ of the texel (the broadness of the range). In order to better understand the capabilities of the proposed architecture the texel circuit was simulated over a broad range of resistive state values for R1 and R2. Performance was then assessed using three key metrics: a) The voltage at which Iout peaks, b) the breadth of the voltage range for which Iout \( \geq \frac{I_{out}}{V_{pk}} \) and c) the maximum overall texel steady-state power dissipation at its preferred V(
). Results are summarised in Fig. 5. For these simulations TSMC 0.35 \( \mu m \) technology devices have been used, C1 and C2 are set to 1 pF (small but controllably achievable in integrated implementations), and VDD = 1.65 V.

The simulations reveal a number of interesting trends: First, Vpk is tunable within a range of \( \approx 0.55 - 1.05 \) V, which is broadly similar to the range defined by the power supply minus the thresholds of the p- and n-MOS devices \([V_{th,n}, VDD - V_{th,p}]\) (approx. \([0.50, 0.94]\) V in this technology). Second, memristor resistive states within 0.1 – 5 M\( \Omega \) suffice for covering a large part of that range. Third, the full-width half maximum (FWHM) plot of Vpk indicates that to some extent it might be possible to cover most of the Vpk range at a controllable degree of sensitivity. The broader the FWHM the blunter the ‘tuning curve’ of the texel (the broadness of the range). In order to better understand the capabilities of the proposed architecture the texel circuit was simulated over a broad range of resistive state values for R1 and R2. Performance was then assessed using three key metrics: a) The voltage at which Iout peaks, b) the breadth of the voltage range for which Iout \( \geq \frac{I_{out}}{V_{pk}} \) and c) the maximum overall texel steady-state power dissipation at its preferred V\(_{IN}\). Results are summarised in Fig. 5. For these simulations TSMC 0.35 \( \mu m \) technology devices have been used, C1 and C2 are set to 1 pF (small but controllably achievable in integrated implementations), and VDD = 1.65 V.

The simulations reveal a number of interesting trends: First, Vpk is tunable within a range of \( \approx 0.55 - 1.05 \) V, which is broadly similar to the range defined by the power supply minus the thresholds of the p- and n-MOS devices \([V_{th,n}, VDD - V_{th,p}]\) (approx. \([0.50, 0.94]\) V in this technology). Second, memristor resistive states within 0.1 – 5 M\( \Omega \) suffice for covering a large part of that range. Third, the full-width half maximum (FWHM) plot of Vpk indicates that to some extent it might be possible to cover most of the Vpk range at a controllable degree of sensitivity. The broader the FWHM the blunter the ‘tuning curve’ of the texel (the broadness of the range). In order to better understand the capabilities of the proposed architecture the texel circuit was simulated over a broad range of resistive state values for R1 and R2. Performance was then assessed using three key metrics: a) The voltage at which Iout peaks, b) the breadth of the voltage range for which Iout \( \geq \frac{I_{out}}{V_{pk}} \) and c) the maximum overall texel steady-state power dissipation at its preferred V\(_{IN}\). Results are summarised in Fig. 5. For these simulations TSMC 0.35 \( \mu m \) technology devices have been used, C1 and C2 are set to 1 pF (small but controllably achievable in integrated implementations), and VDD = 1.65 V.

The simulations reveal a number of interesting trends: First, Vpk is tunable within a range of \( \approx 0.55 - 1.05 \) V, which is broadly similar to the range defined by the power supply minus the thresholds of the p- and n-MOS devices \([V_{th,n}, VDD - V_{th,p}]\) (approx. \([0.50, 0.94]\) V in this technology). Second, memristor resistive states within 0.1 – 5 M\( \Omega \) suffice for covering a large part of that range. Third, the full-width half maximum (FWHM) plot of Vpk indicates that to some extent it might be possible to cover most of the Vpk range at a controllable degree of sensitivity. The broader the FWHM the blunter the ‘tuning curve’ of the texel (the broadness of the range). In order to better understand the capabilities of the proposed architecture the texel circuit was simulated over a broad range of resistive state values for R1 and R2. Performance was then assessed using three key metrics: a) The voltage at which Iout peaks, b) the breadth of the voltage range for which Iout \( \geq \frac{I_{out}}{V_{pk}} \) and c) the maximum overall texel steady-state power dissipation at its preferred V\(_{IN}\). Results are summarised in Fig. 5. For these simulations TSMC 0.35 \( \mu m \) technology devices have been used, C1 and C2 are set to 1 pF (small but controllably achievable in integrated implementations), and VDD = 1.65 V.

The simulations reveal a number of interesting trends: First, Vpk is tunable within a range of \( \approx 0.55 - 1.05 \) V, which is broadly similar to the range defined by the power supply minus the thresholds of the p- and n-MOS devices \([V_{th,n}, VDD - V_{th,p}]\) (approx. \([0.50, 0.94]\) V in this technology). Second, memristor resistive states within 0.1 – 5 M\( \Omega \) suffice for covering a large part of that range. Third, the full-width half maximum (FWHM) plot of Vpk indicates that to some extent it might be possible to cover most of the Vpk range at a controllable degree of sensitivity. The broader the FWHM the blunter the ‘tuning curve’ of the texel (the broadness of the range). In order to better understand the capabilities of the proposed architecture the texel circuit was simulated over a broad range of resistive state values for R1 and R2. Performance was then assessed using three key metrics: a) The voltage at which Iout peaks, b) the breadth of the voltage range for which Iout \( \geq \frac{I_{out}}{V_{pk}} \) and c) the maximum overall texel steady-state power dissipation at its preferred V\(_{IN}\). Results are summarised in Fig. 5. For these simulations TSMC 0.35 \( \mu m \) technology devices have been used, C1 and C2 are set to 1 pF (small but controllably achievable in integrated implementations), and VDD = 1.65 V.
stored templates. Whilst the texel array is computing the match the matching row will consume approx. 10 pts · 1.5 μW = 15 μW whilst the non-matching rows will consume approx. 2 patterns · 10 pts · 0.5 μA · 1.65 V = 16.5 μW for a total of 31.5 μW (based on the ‘table-top’ current in Fig. 4(c)). If the system operates at 12 kHz this comparison can be performed at most at \( f_{\text{sample}}/(\text{pts}/\text{template}) = 1.2 \text{ kHz}. \) If we further assume that a texel assessment can be completed within \( \frac{1}{f_{\text{sample}}} \approx 83 \mu s, \) then the maximum channel power dissipation for an input signal consisting of a constant stream of back-to-back matchable spikes drops to 3.15 μW. Area: transistors M4,5,6 occupying a \( W \cdot L \) of \( 120 \times 1 \text{ μm}^2 \) each, comprise the majority of the total nominal transistor \( W \cdot L \) of \( 415.5 \text{ μm}^2 \) footprint (500 μm² incl. 20% overhead). Note: These calculations are intended to illustrate rough expected power/area overheads only. The technology, transistors sizings and power supply voltage are not optimised and the currents flowing through the system are conservative (e.g. reasonable template-wide match estimate obtainable with \( I_{\text{out}} < 250 \text{nA/texel} \)).

**IV. DISCUSSION AND CONCLUSIONS**

The proposed architecture features its own set of design considerations. First, the AFE block is affected through its input signal range requirements (usable range of \( \approx 0.5 \text{ V} \), the most notable feature being the difference between input range and power supply. This may be potentially addressed using lower threshold transistors in suitable CMOS technologies, in which case the supply voltage may be able to drop without loss of performance. Another important consideration is noise. This is mitigated by the capacitor in the SH circuit (Fig. 3(b)) and by the integrator-based read-out approach (Fig. 2). Next, memristors are non-linear I-V elements which may render control over the precise distribution of voltage in the first stage of each texel challenging. This can be mitigated by using a 1/chip (or few/chip), normally-off programmer that programs the texel array one row at a time; accessing the memristors in each texel individually and manipulating them until the pattern current is maximised at the correct input (currently under development).

Simultaneously, the architecture inherently allows control over the tuning sharpness for each template through adjustment of \( T_H^2 \) in Fig. 2, shows great promise in terms of down-scaling both in area (6 transistors/texel + back-end elements) and power and obviates the need for an analogue-to-digital converter (ADC) anywhere in the system.

In conclusion we have presented a concept architecture for a memristor-CMOS hybrid on-line template matcher with a view towards integrated implementation. We discussed the basic operating principles, gave simulation results based on TSMC 0.35 micron technology illustrating the reconfigurability of the texel circuit underlying the architecture and performed back-of-the-envelope power and area overhead calculations. With further optimisation this technology offers a potentially disruptive solution to the problem of brain recording.

**REFERENCES**


