Learning just how straightforward a Verilog UART receiver is to implement, was quite the eye-opening experience. In fact I’d even say, that my receiver is a good bit simpler than it’s own testbench. I think the big takeaway here is that the bottleneck of ASIC development isn’t the implementation of whatever you’re trying to build, it’s the verification.
In my first UART post I introduced the theory of operation behind the receiver. As you might remember the basic structure consists of a finite state machine with 4 states. This is implemented as a switch-case statement (unlike all other languages verilog omits the „switch“).

The first state „wait for falling edge“ – called STATE_IDLE
in my code is doing the least work. In theory all it has to do is check whether the rx
line is LOW
and change to the next state if it is. In practice it would be nice to have some really basic low pass filtering to ignore glitches and make the receiver more robust.
STATE_IDLE: begin // check for falling edge with some basic filtering if((rx_shift_reg == 4'b0001) || (rx_shift_reg == 4'b0010)) begin cur_state <= STATE_RECEIVE_START; end ctr <= 4; data_ready <= 0; bit_ctr <= 0; data <= 0; end
Outside the case-statement I use a shift register to store the current rx
state. STATE_IDLE
can then check the shift register’s contents for the falling edge pattern. In my case a single HIGH
glitch surrounded by three LOW
values is recognized as a falling edge. If a falling edge is detected the state is changed to STATE_RECEIVE_START
which will receive the start bit.
Inside STATE_IDLE
the counter ctr
is set to 4. This might seem odd at first, however it has a simple reason: Due to the low pass filter, this code is only able to detect a falling edge after 4 cycles have passed. If the counter was still at 0 when we enter the next state, we’d be off by 4 cycles. For lower UART speeds this doesn’t matter, for higher speeds this might be an issue though.
Another thing that’s happening in this state is the reset of data_ready
, bit_ctr
and data
. This should be fairly self-explanatory, but in case it isn’t: This is simply preparation for the next states. They expect a clean-slate and this is how they get that.
Next up is STATE_RECEIVE_START
. It simpy exists to receive the start bit.
STATE_RECEIVE_START: begin if(rx != 0 && ctr < start_bit_duration) begin // rx went back high again, this wasn't a valid stop bit. // reset to idle state cur_state <= STATE_IDLE; ctr <= 0; end // Check if start bit is over, if yes, move to next state if(ctr == bit_duration) begin ctr <= 0; cur_state <= STATE_READ_DATA; end end
There are really only 2 things that can happen here. Either a) rx
goes high before the start bit is over, or b) The start bit is over. In the first case we reset to STATE_IDLE
. We didn’t receive a valid start bit and don’t want to continue and therefore want to start fresh. If the transmission is a bit glitchy or the clocks of receiver and transmitter aren’t perfectly synchronous, the start bit might not be recognized reliably though. To understand just how critical this timing can be let’s take a look at a fairly simple 9600 baudrate transmission and a internal clock of 20 MHz. Shouldn’t be too problematic, right? Well, as we’ll see the problem isn’t baudrate, it’s the internal clock speed.
9600 baud = 9600 Hz => Tbit = 104.2 µs
Tclk = 1 / fclk = 1 / 20000000 Hz => 50 ns = 0.05 µs
From this we can calculate the maximum relative deviation trel = Tclk / Tbit = 0.00048 = 0.048 % = 480 ppm
Whoa! That’s quite low. Granted, it’s not „absolutely impossible“-low, but it’s still a much smaller tolerance than preferable. If you breath on your IC wrong you might just heat it enought to sufficiently change the clock (probably not). I’d like a little more leeway than that.
The way around this was to have 2 durations. One to „wait out“ the start bit and another duration during which rx
must be LOW
. They can be the same, but in practice I’ll probably set the start_bit_duration
to something in the 80-90% range of bit_duration
.
After start_bit_duration
has passed, we still don’t want to move on to the next state though. We simply stop checking rx
. Only after bit_duration
has passed will the state machine move on. I could have implemented this with another state, but didn’t really see the point for something that simple. In the end splitting the code up into states serves to simplify the code. Introducing an additional state for this would have the opposite effect in my opinion.
Next on the list is STATE_READ_DATA
, the bread and butter of this module. It receives the actual payload bits.
STATE_READ_DATA: begin if(ctr == bit_duration >> 1) begin // we're in the middle of the bit, this is the point where we want to read the bit // increment the bit counter bit_ctr <= bit_ctr + 1; // shift in the bit that was read data <= {rx, data[7:1]}; end if(ctr == bit_duration) begin // at this point the current bit was read, we can either read the next data bit or the stop bit(s) // we have read all 8 databits, read stop bit next if(bit_ctr == 8) begin cur_state <= STATE_STOPBITS; ctr <= 0; end else begin ctr <= 0; end end // rx was low for long enough to count as uart start bit, now we wait for the START_BIT_LENGTH + half baud length, then go to STATE_RECEIVE_DATA end
As you see there are two if blocks. The first compares the counter to bit_duration >> 1
, which is exactly half of bit_duration
. We want to read the payload bit smack-bang in the middle. We store the bit in a shift register called data
and increment the counter bit_ctr
, which counts how many bits we have received already.
The second if statement checks if the bit is over. If it is, what follows is either another databit or the stopbit. The way we differentiate between these two is by checking bit_ctr
, it should be 8 if we received all 8 bits. If it isn’t we read another bit, if it is we move to STATE_STOPBITS
.
Lastly there is STATE_STOPBITS
. And yes, I actually went all the way and implemented the option to have a variable stop bit length.
STATE_STOPBITS: begin // Count to stop bit duration, then reset state to idle if((stopbits == 2'b00) && (ctr == bit_duration >> 1)) begin // 0.5 stop bits data_ready <= 1; cur_state <= STATE_IDLE; end else if ((stopbits == 2'b01) && (ctr == bit_duration)) begin // 1 stop bit data_ready <= 1; cur_state <= STATE_IDLE; end else if ((stopbits == 2'b10) && (ctr == {1'b0,bit_duration[15:0]} + bit_duration >> 1)) begin // this condition adds half of bit_duration to bit_duration in a 17-bit wide register to get 1.5x bit_duration data_ready <= 1; cur_state <= STATE_IDLE; end else if(ctr[15:1] == bit_duration) begin // functionally equivalent to left shift of bit_duration data_ready <= 1; cur_state <= STATE_IDLE; end end
Depending on the number of stopbits (0.5, 1, 1.5 or 2) ctr
must be compared to different values. Half a stopbit, one stopbit and two stopbits are pretty simple. We compare ctr
to the original value shifted by -1, 0 or +1 positions. 1.5 stopbits is where it gets a little ugly though. Multiplication is slow, so we use addition of bit_duration
with its half. Since that addition might overflow, we have to use a register 1 bit wider than bit_duration
itself. It all makes for some pretty ugly code in the end.
Maybe you wonder why I’m not checking whether rx
stays high during the stopbit. According to the UART protocl the line must stay high after all. The simple answer is it’s not really necessary to receive the byte. The stopbits serve as a pause between two transmissions. It doesn’t really matter what the rx
line is doing after the payload has been received. Either rx
stays high and everything is perfectly fine or the next transmission starts early. If the next transmission starts early, that shouldn’t affect the validity of the current byte. It should only affect the validity of the new, „early“ byte. With my implementation the new byte will cause the state machine to detect an invalid state eventually and reset to STATE_IDLE
. That might even happen multiple times. If the next byte after that is valid again, it’ll be able to detect that, since the receiver will be in STATE_IDLE
.
What actually happens here is pretty simple though. We set the data_ready
flag HIGH
to indicate that the value in the data
register is valid and then return to the STATE_IDLE
state. Any consumer of this uart module may then check the data_ready
bit and take the data from this module whenever that flag is high.
The result looks good as you can see below.

Figure 2 might warrant some short explanation. clk
is obviously the internal clock. It’s ticking fast. Really fast in fact. 20 MHz to be a little more precise. uart_tx_test
is the test transmission sent by the testbench. The interesting waveforms are data_ready
and data[7:0]
. The former only ever goes high for a single clock cycle: Whenever data
contains valid data and the transmission is complete data_ready
will go high. data
on the other hand contains garbage most of the time during and after each transmission. The reason is that it is simply the raw output of the receiver’s internal shift register and displays intermediate states of the transmission. Outputting invalid data is a terrible idea though, so what gives? I’m actually planning to deal with this at a higher level of abstraction. My uart_rx
module isn’t supposed to be used on it’s own, it will be part of a more abstract UART
module. This module will contain all the goodies one expects of a uart peripheral, such as transmit/receive FIFOs, interrupts, simple baudrate configuration, a way to integrate it with a DMA, memory mapping (to access it like any other RAM address) and a whole bunch of other things.
Notable missing features are parity calculation and simple baudrate configuration. I might implement those in the future. They’re both easy to implement. As for why I haven’t added it yet, I don’t need parity calculation and I think simple baudrate configuration should be handled at a higher level of abstraction.
As always, for completeness sake, here’s the full code:
module uart_rx( input clk, rst, rx, input [15:0] bit_duration, start_bit_duration, input [1:0] stopbits, output reg [7:0] data, output reg data_ready ); reg [3:0] rx_shift_reg; reg [16:0] ctr; // larger than bit_duration to prevent overflows during internal calculations reg [4:0] bit_ctr; reg [1:0] cur_state; // States for uart statemachine parameter STATE_IDLE = 2'b00; parameter STATE_RECEIVE_START = 2'b01; parameter STATE_READ_DATA = 2'b10; parameter STATE_STOPBITS = 2'b11; always @ (posedge clk) begin if(rst) begin // set internal registers to 0 ctr <= 0; rx_shift_reg <= 0; bit_ctr <= 0; // set outputs to 0 data <= 0; data_ready <= 0; end else begin // Shift register of past rx values for filtering etc rx_shift_reg <= {rx, rx_shift_reg[3:1]}; ctr <= ctr + 1; // rx state machine here case(cur_state) default: cur_state <= STATE_IDLE; STATE_IDLE: begin // check for falling edge with some basic filtering if((rx_shift_reg == 4'b0001) || (rx_shift_reg == 4'b0010)) begin cur_state <= STATE_RECEIVE_START; end ctr <= 4; data_ready <= 0; bit_ctr <= 0; //if(!data_ready) begin data <= 0; //end end STATE_RECEIVE_START: begin if(rx != 0 && ctr < start_bit_duration) begin // rx went back high again, this wasn't a valid stop bit. // reset to idle state cur_state <= STATE_IDLE; ctr <= 0; end // Check if start bit is over, if yes, move to next state if(ctr == bit_duration) begin ctr <= 0; cur_state <= STATE_READ_DATA; end end STATE_READ_DATA: begin if(ctr == bit_duration >> 1) begin // we're in the middle of the bit, this is the point where we want to read the bit // increment the bit counter bit_ctr <= bit_ctr + 1; // shift in the bit that was read data <= {rx, data[7:1]}; end if(ctr == bit_duration) begin // at this point the current bit was read, we can either read the next data bit or the stop bit(s) // we have read all 8 databits, read stop bit next if(bit_ctr == 8) begin cur_state <= STATE_STOPBITS; ctr <= 0; end else begin ctr <= 0; end end // rx was low for long enough to count as uart start bit, now we wait for the START_BIT_LENGTH + half baud length, then go to STATE_RECEIVE_DATA end STATE_STOPBITS: begin // Count to stop bit duration, then reset state to idle if((stopbits == 2'b00) && (ctr == bit_duration >> 1)) begin // 0.5 stop bits data_ready <= 1; cur_state <= STATE_IDLE; end else if ((stopbits == 2'b01) && (ctr == bit_duration)) begin // 1 stop bit data_ready <= 1; cur_state <= STATE_IDLE; end else if ((stopbits == 2'b10) && (ctr == {1'b0,bit_duration[15:0]} + bit_duration >> 1)) begin // this condition adds half of bit_duration to bit_duration in a 17-bit wide register to get 1.5x bit_duration data_ready <= 1; cur_state <= STATE_IDLE; end else if(ctr[15:1] == bit_duration) begin // functionally equivalent to left shift of bit_duration data_ready <= 1; cur_state <= STATE_IDLE; end end endcase end end endmodule