In the first
“Binary search isn’t about search”
post, we spoke about
using assert
statements to enforce your loop invariants.
Our plain old everyday binary search invariant can be summarized as such:
- For all x in
L[0:l]
1, x is strictly less thanT
, the element we are searching for. - For all y in
L[r:len(L)]
2, y is strictly greater thanT
, the element we are searching for.
Or, if we want to be even terser, we could note this as simply
|
|
Suppose our array has duplicates, though; the ordinary binary search will find one of them, but it makes no claims as to which one. Indeed, different binary searches will find different indices, which you can prove for yourself by running the linked script.
What if we want to get the first matching element in the array?
An example which suggests a new invariant
Let’s consider a simple array which looks like the following:
We don’t know yet exactly how to get this first 1
. But we do know we need mid
to eventually get
in the area. The first invariant, L[0:l] < T
, seems fine; we could slide l
right up to the
first 1
, so that L[0:l]
contains only zeroes. The second invariant, L[r:len(L) < T
, however,
has to be weakened – there’s just no way to make this nonempty which doesn’t contain 1
s.
Let’s see what we can design with the new invariant,
|
|
That suggests an implementation that looks like
|
|
And, indeed, this works! Running the Python script at the link will show you a fancy command-line visualization of this very thing in practice.
Our example will end looking like so:
Why not return mid
?
The last thing we should mention is that, in our “ordinary” invariant-based binary search,
we either return mid
or we return -1
(for “Not Found”). Here, we elect to return l
.
But why? How can we remember this?
Take a look again at the invariant
|
|
Unlike
|
|
the new weak inequality means we actually can find values of l
and r
such that
L == L[0:l] + L[r:len(L)]
in every case. In other words, L[l:r]
can and should always
be allowed to dwindle to nothing. This is also a characteristic the rightmost element
search has!
What happens when we leftmost search for an element that isn’t there?
Our ordinary binary search gives us -1
when T not in L
. Our leftwise element
search, however, does something much more interesting: It gives us the index
where that element would be inserted if it was there.
You can visualize this if you return to our early example. Suppose we had some
larger number than 1, but we tried searching for 1
. Our invariants would
still give us exactly the same indices at the end:
You could of course get your “normal” if T not in L return -1
property back
without affecting the time complexity by changing that end statement to
something like
|
|
Next up: Rightwise element search. If this all made sense to you there should be little to surprise you there.