Build your own: Fuzzy String Search

Have you ever tried searching names in Mendix? It is often a challenge to get the right results or even any results at all. Names can be misheared, typos are common and names can be represented in different variants (legal names, common names, given names). Causing many different variations for just one string, matching them against the existing data might not be straightforward. What we need is ‘fuzzy string matching’ which looks for approximately rather than exact matches.

This how-to will describe the steps to build your own fuzzy string search functionality in Mendix and PostgreSQL.

Basic setup

Mendix apps in the Mendix cloud run on a PostgreSQL database. PostgreSQL comes with the fuzzystrmatch module which provides several functions to determine similarities and distance between strings. The first step is to enable this extension.

By using the query APIs built into the Mendix platform we can leverage the functions available in PostgreSQL and write our search queries.

Functions

The fuzzy string match module in Postgres offers different functions for string matching:

  • Soundex - a method of matching similar-sounding names by converting them to the same code
  • Daitch-Mokotoff Soundex - matches similar-sounding names by converting them to the same code. However, Daitch-Mokotoff Soundex is significantly more useful for non-English names than the original system
  • Levenshtein - calculates the Levenshtein distance between two strings, based on a cost for each manipulation required
  • Metaphone - like Soundex, is based on the idea of constructing a representative code for an input string. Two strings are then deemed similar if they have the same codes
  • Double Metaphone - computes two “sounds like” strings for a given input string — a “primary” and an “alternate”. In most cases they are the same, but for non-English names especially they can be a bit different, depending on pronunciation

Read the PostgreSQL documentation for fuzzystrmatch to see which function fits your use case the best.

Enable fuzzystrmatch on PostgreSQL

Execute the following SQL statement in the after start-up logic of the Mendix project: CREATE EXTENSION IF NOT EXISTS fuzzystrmatch; It will create the extension if it does not already exist. In other words, it enables this feature on the PostgreSQL database.

Example code for JA_ExecuteSQL_CreateExtension.java

Create a Java Action in your project containing the code below and call that Java Action in your After Startup flow.

JA_ExecuteSQL_CreateExtension.java
// This file was generated by Mendix Studio Pro.
//
// WARNING: Only the following code will be retained when actions are regenerated:
// - the import list
// - the code between BEGIN USER CODE and END USER CODE
// - the code between BEGIN EXTRA CODE and END EXTRA CODE
// Other code you write will be lost the next time you deploy the project.
// Special characters, e.g., é, ö, à, etc. are supported in comments.

package school.actions;

import com.mendix.core.Core;
import com.mendix.logging.ILogNode;
import com.mendix.systemwideinterfaces.core.IContext;
import com.mendix.webui.CustomJavaAction;
import java.sql.PreparedStatement;
import java.sql.SQLException;

/**
 * Runs SQL statement on database:
 * 
 * CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
 */
public class JA_ExecuteSQL_CreateExtension extends CustomJavaAction<java.lang.Boolean>
{
	public JA_ExecuteSQL_CreateExtension(IContext context)
	{
		super(context);
	}

	@java.lang.Override
	public java.lang.Boolean executeAction() throws Exception
	{
		// BEGIN USER CODE
		Boolean done = Core.dataStorage().executeWithConnection(connection ->
	        {
				try {
					PreparedStatement stmt = connection.prepareStatement("CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;");
					stmt.executeUpdate();
				}
				catch (SQLException e) {
					logger.error("SQL Query failed: " + e);
					return false;
				}
				return true;
	        });
		return done;
		// END USER CODE
	}

	/**
	 * Returns a string representation of this action
	 * @return a string representation of this action
	 */
	@java.lang.Override
	public java.lang.String toString()
	{
		return "JA_ExecuteSQL_CreateExtension";
	}

	// BEGIN EXTRA CODE
	private final ILogNode logger = Core.getLogger(this.getClass().getName());
	// END EXTRA CODE
}

Create the search input

Allow the user to provide the search input (e.g. enter the name) or allow a search to be executed from existing data (e.g. by calling a microflow with an object that contains the data to search for).

Execute search query

Execute the SQL query through a Java action. The code performs two actions:

  1. Retrieve the result of the SQL query, which needs to be a list of Mendix IDs. This utilizes the Mendix query API: Core.dataStorage().executeWithConnection and PreparedStatement stmt = connection.prepareStatement();
  2. Use the result of the previous step (a list of Mendix IDs) to retrieve the actual Mendix objects via Core.retrieveIdList();

Since the input is user-generated we want to make sure it’s safe to use on the database. This can be done by defining the input as parameter values for the query, don’t concatenate the input string with the query, else Little Bobby Tables might cause some trouble!

The Java Action will return a list of objects as the result.

Example code for execute query

Create a Java Action in your project containing the code below and call that Java Action when you want to execute a search query.

JA_RetrieveAdvancedSqlByID.java
// This file was generated by Mendix Studio Pro.
//
// WARNING: Only the following code will be retained when actions are regenerated:
// - the import list
// - the code between BEGIN USER CODE and END USER CODE
// - the code between BEGIN EXTRA CODE and END EXTRA CODE
// Other code you write will be lost the next time you deploy the project.
// Special characters, e.g., é, ö, à, etc. are supported in comments.

package school.actions;

import com.mendix.core.Core;
import com.mendix.core.CoreException;
import com.mendix.logging.ILogNode;
import com.mendix.systemwideinterfaces.MendixRuntimeException;
import com.mendix.systemwideinterfaces.core.IContext;
import com.mendix.systemwideinterfaces.core.IMendixIdentifier;
import com.mendix.systemwideinterfaces.core.IMendixObject;
import com.mendix.webui.CustomJavaAction;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;

/**
 * Executes the SQL query:
 * - the SQL query should only return the object ID
 * - the java action will use that list of ID's to retrieve the objects from database
 * 
 * Input:
 * - The SQL query
 * - The expected return type (e.g. CaseManagement.CaseObject)
 * 
 * 
 */
public class JA_RetrieveAdvancedSqlByID extends CustomJavaAction<java.util.List<IMendixObject>>
{
	private java.lang.String Sql;
	private java.lang.String ReturnEntity;
	private java.lang.String strValue1;

	public JA_RetrieveAdvancedSqlByID(IContext context, java.lang.String Sql, java.lang.String ReturnEntity, java.lang.String strValue1)
	{
		super(context);
		this.Sql = Sql;
		this.ReturnEntity = ReturnEntity;
		this.strValue1 = strValue1;
	}

	@java.lang.Override
	public java.util.List<IMendixObject> executeAction() throws Exception
	{
		// BEGIN USER CODE
		logger.debug("executeAction: " + this.Sql);
		
        List<IMendixObject> resultList = null;
        resultList = Core.dataStorage().executeWithConnection(connection ->
        {
        	List<IMendixObject> objects = new ArrayList<IMendixObject>();
        	List<IMendixIdentifier> resultIDs = new ArrayList<IMendixIdentifier>();
        	try {
                PreparedStatement stmt = connection.prepareStatement(this.Sql);
				stmt.setString(1, this.strValue1);
				stmt.setString(2, this.strValue1);
                ResultSet rset = stmt.executeQuery();
                while(rset.next()) {
                	resultIDs.add(Core.createMendixIdentifier(rset.getLong(1)));
                }
        	}
        	 catch (SQLException e) {
        		 logger.error("SQL Query failed: " + e);
                 throw new MendixRuntimeException(e);
            }    	
        	try {
        		objects = Core.retrieveIdList(getContext(), resultIDs);
        	}
        	catch (CoreException e) {
        		logger.error("Failed to retrieve objects by id: " + e.getMessage());
			}
        	return objects;
        });
        return resultList;
		// END USER CODE
	}

	/**
	 * Returns a string representation of this action
	 * @return a string representation of this action
	 */
	@java.lang.Override
	public java.lang.String toString()
	{
		return "JA_RetrieveAdvancedSqlByID";
	}

	// BEGIN EXTRA CODE
	private final ILogNode logger = Core.getLogger(this.getClass().getName());
	// END EXTRA CODE
}

In the example project we use the Levensthein function to determine the distance between two strings. It uses the following query, which contains two parameter values indicated by the ?. The max distance (or cost) in this query is hardcoded to 8, but that can be any value or the outcome of a function; i.e. the length of the search input divided by two, maxed to 8 min(ceil(length($SearchHelper/SearchInput) div 2),8).

SELECT s.id
FROM school$student s
WHERE LEVENSHTEIN("fullname",?) < 8
ORDER BY LEVENSHTEIN("fullname",?) ASC 

Did you know?

You can utilize other functions for fuzzy string matching in your search as well as a combination of functions to get the best result

Process the result

Process the result in the microflow (e.g. remove the current object from the result list, to avoid displaying the search input as a search result) and make it visible to the user.

Demo project

To try the Levenshtein function in a demo project, go to https://fuzzystringsearch-sandbox.mxapps.io/.

Screenshots from Studio Pro the example project:

Domain model with an entity for Student and an entity for SearchHelper

Microflow which calculates the max distance, creates the query and then executes the query through a java action call

Properties of the call Java Action

Further considerations

  • Alternatives for the fuzzystrmatch are pg_trim and pg_similarity. You can enable pg_trim with the CREATE EXTENSION command but pg_similarity would require you to install it yourself (not possible in Mendix cloud)
  • Optimization: Certain functions allow pre-calculation. For example, Metaphone codes can be pre-calculated and stored with the existing data.
  • Security: The SQL queries executed on the database are not covered by the Mendix security model, which means we need to apply security constraints to the query
  • Other PostgreSQL extensions