DEV Community

Cover image for Substring search
James Robb
James Robb

Posted on • Edited on

Substring search

One common task we do daily as software engineers is working with strings to parse inputs, validate values and the like.

In this article we will be writing a function to check if a given string contains a given substring. This process is commonly known as a substring search and is usually a built-in feature of most languages but we will implement the functionality for ourselves to understand the process at hand!

For todays article we will be working with Rust as our language of choice and Cargo which is the Rust package manager.

To begin we must execute the following in the command line:

mkdir substring_search
cd substring_search
cargo init --name substring_search
Enter fullscreen mode Exit fullscreen mode

This will create a directory called substring_search, move into that directory and then initialise a new Rust project via Cargo named substring_search.

Tests

As always, lets begin first with the tests!

In src/main.rs we can write the following:

#[cfg(test)]
mod tests {
    const SEARCH_STRING_SLICE: &'static str = "abcdefg";

    #[test]
    fn returns_true_for_substring_at_start_of_string() {
        let haystack = SEARCH_STRING_SLICE.to_string();
        let needle = String::from("ab");

        assert_eq!(super::contains(haystack, needle), true);
    }

    #[test]
    fn returns_true_for_substring_at_middle_of_string() {
        let haystack = SEARCH_STRING_SLICE.to_string();
        let needle = String::from("d");

        assert_eq!(super::contains(haystack, needle), true);
    }

    #[test]
    fn returns_true_for_substring_at_end_of_string() {
        let haystack = SEARCH_STRING_SLICE.to_string();
        let needle = String::from("fg");

        assert_eq!(super::contains(haystack, needle), true);
    }

    #[test]
    fn returns_false_for_invalid_substring() {
        let haystack = SEARCH_STRING_SLICE.to_string();
        let needle = String::from("hij");

        assert_eq!(super::contains(haystack, needle), false);
    }
}
Enter fullscreen mode Exit fullscreen mode

Walking through this, we begin by declaring an attribute which states that when the configured environment is test, we should run the code within the tests module.

Inside the tests module we declare a constant which is to be the search string used across each of our tests.

Sidenote 1:

Rust has two kinds of string, for want of a better term, these are known as strings _and _slices. A _string _is of the String type and a _slice _is of the &str type.

In the case of our SEARCH_STRING_SLICE constant it is declared as a slice.

Sidenote 2:

The reason it is declared as a slice initially is because rust cannot compile static Strings since strings are mutable but it can compile static &str slices but this is a side topic which I will be covering in my other series on the Rust programming language in the near future when we tackle the topic of compound types in Rust.

With this constant defined, we implemented our test cases:

  1. A substring that is at the beginning of the search string.
  2. A substring that is in the middle of the search string.
  3. A substring that is at the end of the search string.
  4. A substring that does not exist in the search string.

Sidenote 3:

When we want to run our contains function, which we will implement in the next section, we use super::contains to call it.

This is because super is a module scope keyword which tells rust to look in the parent scope of the module for the function and not in the current tests module scope for example.

If we had defined the contains function in the tests module itself, we would instead use the self::contains syntax.

To run the tests we can run the following via the command line: cargo test.

Implementation

fn contains(haystack: String, needle: String) -> bool {
    if haystack.len() < needle.len() {
        return false;
    }

    if haystack.chars().take(needle.len()).collect::<String>() == needle {
        return true;
    }

    return contains(haystack.chars().skip(1).collect(), needle);
}
Enter fullscreen mode Exit fullscreen mode

For this implementation of the contains function we require two String instances:

  1. The haystack that we will be searching within for the needle
  2. The needle which we will be searching for within the haystack

If the haystack is smaller than the needle, it cannot contain the needle and thus it will return false.

If the haystacks characters from the start of the haystack to the size of the needle is equal to the needle then we return true.

Otherwise we slice off the first character of the haystack and use that as the new haystack to search within for the needle.

A successful substring check could be visualised like so:

Recursive iteration #1:

haystack = "abc"
needle = "bc"

haystack length < needle length ❌
"ab" is equal to the needle ❌

Recursive iteration #2:

haystack = "bc"
needle = "bc"

haystack length < needle length ❌
"bc" is equal to the needle ✔️

Return true
Enter fullscreen mode Exit fullscreen mode

An unsuccessful substring check could be visualised like so:

Recursive iteration #1:

haystack = "abc"
needle = "de"

haystack length < needle length ❌
"ab" is equal to the needle ❌

Recursive iteration #2:

haystack = "bc"
needle = "de"

haystack length < needle length ❌
"bc" is equal to the needle ❌

Recursive iteration #3:

haystack = "c"
needle = "de"

haystack length < needle length ✔️

Return false
Enter fullscreen mode Exit fullscreen mode

We can implement the same recursive functionality using any language, for example with JavaScript it is effectively the same code:

function contains(haystack, needle) {
  if(haystack.length < needle.length) {
    return false;
  }

  if(haystack.slice(0, needle.length) == needle) {
    return true;
  }

  return contains(haystack.slice(1), needle);
}
Enter fullscreen mode Exit fullscreen mode

Conclusions

Compared to some of the other algorithms we have looked at in this series such as generating permutations or approximating PI, this is a relatively simple algorithm. In some sense though it is a more important one to understand since we use strings, substrings, etc on a daily basis and, by and large, take for granted the underlying algorithms at play.

I hope that you found some value in todays post and if you have any questions, comments or suggestions, feel free to leave those in the comments area below the post!

Top comments (0)