Build your own: Fuzzy String Search
Have you ever tried searching names in Mendix? It is often a challenge to get the right results or even any results at all. Names can be misheared, typos are common and names can be represented in different variants (legal names, common names, given names). Causing many different variations for just one string, matching them against the existing data might not be straightforward. What we need is ‘fuzzy string matching’ which looks for approximately rather than exact matches.
This how-to will describe the steps to build your own fuzzy string search functionality in Mendix and PostgreSQL.
Basic setup
Mendix apps in the Mendix cloud run on a PostgreSQL database. PostgreSQL comes with the fuzzystrmatch module which provides several functions to determine similarities and distance between strings. The first step is to enable this extension.
By using the query APIs built into the Mendix platform we can leverage the functions available in PostgreSQL and write our search queries.
Functions
The fuzzy string match module in Postgres offers different functions for string matching:
- Soundex - a method of matching similar-sounding names by converting them to the same code
- Daitch-Mokotoff Soundex - matches similar-sounding names by converting them to the same code. However, Daitch-Mokotoff Soundex is significantly more useful for non-English names than the original system
- Levenshtein - calculates the Levenshtein distance between two strings, based on a cost for each manipulation required
- Metaphone - like Soundex, is based on the idea of constructing a representative code for an input string. Two strings are then deemed similar if they have the same codes
- Double Metaphone - computes two “sounds like” strings for a given input string — a “primary” and an “alternate”. In most cases they are the same, but for non-English names especially they can be a bit different, depending on pronunciation
Read the PostgreSQL documentation for fuzzystrmatch to see which function fits your use case the best.
Building the fuzzy search
Enable fuzzystrmatch on PostgreSQL
Execute the following SQL statement in the after start-up logic of the Mendix project: CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
It will create the extension if it does not already exist. In other words, it enables this feature on the PostgreSQL database.
Example code for JA_ExecuteSQL_CreateExtension.java
Create a Java Action in your project containing the code below and call that Java Action in your After Startup flow.
Create the search input
Allow the user to provide the search input (e.g. enter the name) or allow a search to be executed from existing data (e.g. by calling a microflow with an object that contains the data to search for).
Execute search query
Execute the SQL query through a Java action. The code performs two actions:
- Retrieve the result of the SQL query, which needs to be a list of Mendix IDs. This utilizes the Mendix query API:
Core.dataStorage().executeWithConnection
andPreparedStatement stmt = connection.prepareStatement();
- Use the result of the previous step (a list of Mendix IDs) to retrieve the actual Mendix objects via
Core.retrieveIdList();
Since the input is user-generated we want to make sure it’s safe to use on the database. This can be done by defining the input as parameter values for the query, don’t concatenate the input string with the query, else Little Bobby Tables might cause some trouble!
The Java Action will return a list of objects as the result.
Example code for execute query
Create a Java Action in your project containing the code below and call that Java Action when you want to execute a search query.
In the example project we use the Levensthein function to determine the distance between two strings. It uses the following query, which contains two parameter values indicated by the ?
. The max distance (or cost) in this query is hardcoded to 8, but that can be any value or the outcome of a function; i.e. the length of the search input divided by two, maxed to 8 min(ceil(length($SearchHelper/SearchInput) div 2),8)
.
Did you know?
You can utilize other functions for fuzzy string matching in your search as well as a combination of functions to get the best result
Process the result
Process the result in the microflow (e.g. remove the current object from the result list, to avoid displaying the search input as a search result) and make it visible to the user.
Demo project
To try the Levenshtein function in a demo project, go to https://fuzzystringsearch-sandbox.mxapps.io/.
Screenshots from Studio Pro the example project:
Further considerations
- Alternatives for the
fuzzystrmatch
are pg_trim and pg_similarity. You can enablepg_trim
with theCREATE EXTENSION
command butpg_similarity
would require you to install it yourself (not possible in Mendix cloud) - Optimization: Certain functions allow pre-calculation. For example, Metaphone codes can be pre-calculated and stored with the existing data.
- Security: The SQL queries executed on the database are not covered by the Mendix security model, which means we need to apply security constraints to the query
- Other PostgreSQL extensions