Implement strStr()

Implement strStr().Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack

Input: text = "aaaaa", pattern = "bba"
Output: -1

Approach1: Brute Force

When tackling problems of this sort, an obvious method for doing a substring search would be to check, for each possible position in the text at which the pattern could match, whether it does indeed match.

We will keep one pointer (i) on the text and another pointer (j) into the pattern.
For each i, j resets to 0 (starts fresh to look for pattern) and gets incremented until it finds a mismatch or utlimately finds a pattern (j == n)
if we reach the end of the text (i == m - n + 1), we can return -1 thereby indicating we have not found the specified pattern within the text.
The reason the loop terminates at i == m - n + 1 opposed to i == m is due to the fact that if i has n - k elements left in the loop where k > 0 therefore it is mathematically and practically imopossible to identify a pattern at that point that would be of size n or greater.

class Solution(object):
    def strStr(self, haystack: str, needle: str) -> int:
        if needle == "":
            return 0
        m,n = len(haystack), len(needle)
        
        for i in range(m - n + 1):
            for j in range(n):
                if haystack[i + j] != needle[j]:
                    break
                if j == n - 1:
                    return i
        return -1

Time: O(m * n) Space: O(1)

Real Life applications:

Plagiarism Detection: I'm sure that we all remember that back in college, some professors wanted our essays to be emailed as well as printed out. It so happens that our essays were being compared and decomposed into string tokens and compared using string matching algorithms.(They probably implement some of the most optimized versions as mentionned below)
Honorable mentions: Bioinformatics and DNA Sequencing, Spam Filters, Spelling Checker

Why the Improvements ?

Let's say we're hypothetically dealing with an immense array of texts that contains long string of As. If the pattern that we want to identify also begin with long strings of A's, then finding the pattern will be slow. In that scenario, we probably would want to implement a faster algorithm. (Thankfully) We do not have to start the research ourselves as over the years, brilliant individuals have come up with different realizations that we can tap into in order to optimize our implementation. The different variations would be KMP, Boyer-Moore, Rabin-Karp and lastly the Z-algorithm.

Each of these algorithms presents their own strength and weaknesses based of the nature of the data they are dealing with but they will be nonetheless faster than the Brute-force.

Frankly, I find KMP the one I most often remember however I'd consider the Rabin-Karp, most pragmatic as you can tune it to your needs or extract it's underlying technique and use as an arsenal for various problems.

Approach2: Rabin-Karp (Monte Carlo) Java Version

Calculate Hash for pattern
Calculate Hash for 1st Windown in tent
Repeat the below until loops ends:
- if hash(pattern) == hash(text) then match characters one by one because this algorithm can output a correct answer with a small probability(Monte Carlo). To further prevent the probability of collisions(or if your belief in probability theory is faint), we take the remainder of the output after being divided by a 31 bit prime number. Prime number is used to spread the distribution of hash output consequently reducing rate of collisions.
- Substract leftmost from hash(text)
- Shift entire hash(text) by unit to the right
- Add new Character to the window
Ultimately, we should retain that the competitive advantage of Rabin Karp is it's ability to reuse already calculated hash value in constant time (from which it derives it's name). A more straightforward but less efficent alternative is to recalculate the hash function of the characters that we're pointing to at each stop thereby rendering the algorithm more costly than the Brute-force.

import java.util.*;
import java.math.*;

class Solution{
 private final static long Q = primeNumber();
 private final static int R = 256;
 long RM = 1;
 
 //Rabin-Karp algorithm
 public int strStr(String haystack, String needle){
  int n = haystack.length();
  int m = needle.length();
  
  for(int i = 1; i <= m - 1; i++){
   RM = (RM * R) % Q;
   }
   
 if(m > n ) return - 1;
 
    long textHash = hash(haystack, n);
    long patternHash = hash(needle, m);
    
    if((textHash == patternHash) && check(haystack, needle, 0)){
       return 0;
       }
       
   for(int i = m; i < n; i++){
     textHash = (textHash + Q  - RM * haystack.charAt(i - m) % Q) % Q;
     textHash = (textHash * R + haystack.charAt(i) % Q);
     int offset = i - m + 1;
     if((textHash == patternHash) && check(haystack, needle, offset)){
       return offset;
         }
        }
        return n; 
      }
      
   private boolean check(String hayStack, String needle, int i){
     for(int j = 0; j < needle.length(); j++)
       if(needle.charAt(j) != hayStack.charAt(i + j))return false;
     return true;
     }
     
   private long hash(String t, int m){
       long h = 0;
       for(int i = 0; i < m; i++){
         h = (h * R + t.charAt(i)) % Q;
         }
       return h;
       }
       
   private static long primeNumber(){
       return BigInteger.probablePrime(31, new Random()).longValue();
   }

Time: O(m + n) Space: O(1)

PreviousOnline Stock Span Next2Sum

Last updated 3 years ago

Was this helpful?

class Solution(object): def strStr(self, haystack: str, needle: str) -> int: if needle == "": return 0 m,n = len(haystack), len(needle) for i in range(m - n + 1): for j in range(n): if haystack[i + j] != needle[j]: break if j == n - 1: return i return -1

import java.util.*; import java.math.*; class Solution{ private final static long Q = primeNumber(); private final static int R = 256; long RM = 1; //Rabin-Karp algorithm public int strStr(String haystack, String needle){ int n = haystack.length(); int m = needle.length(); for(int i = 1; i <= m - 1; i++){ RM = (RM * R) % Q; } if(m > n ) return - 1; long textHash = hash(haystack, n); long patternHash = hash(needle, m); if((textHash == patternHash) && check(haystack, needle, 0)){ return 0; } for(int i = m; i < n; i++){ textHash = (textHash + Q - RM * haystack.charAt(i - m) % Q) % Q; textHash = (textHash * R + haystack.charAt(i) % Q); int offset = i - m + 1; if((textHash == patternHash) && check(haystack, needle, offset)){ return offset; } } return n; } private boolean check(String hayStack, String needle, int i){ for(int j = 0; j < needle.length(); j++) if(needle.charAt(j) != hayStack.charAt(i + j))return false; return true; } private long hash(String t, int m){ long h = 0; for(int i = 0; i < m; i++){ h = (h * R + t.charAt(i)) % Q; } return h; } private static long primeNumber(){ return BigInteger.probablePrime(31, new Random()).longValue(); }