Matching Engine¶
Introduction¶
Tinder, Google AdWords and Pandora are some examples of businesses whose core is based on matching engine. Matching elements from one set to another has lot of business value, lets consider a few of them:
Matching resources to jobs
Matching a product to a potential customer
Matching customer service representative to a complain customer might have
In this article we explore different matching problems, how to abstract them into a form that modern algorithms can solve.
Motivating Examples¶
There are three projects in distinct domains where we’ve built custom matching engine. Each of the domain had special domain requirements which will be covered in detail
Travel: Matching destinations to traveller
Recruiting: Matching job seeker with recruiting company based both on preference and location proximity
Legal: Matching technical expert that an Attorney can hire for help with Patent Litigation they might be fighting
Travel¶
For a Mobile Application that our client was building they wanted to build a matching engine which will enable users to make informed decisions about their itinerary.
Depending on a brief questionnaire users filled in, we build a profile of what profile category the user might fall into. Based on this we look at recommending activities they might do at a location of their choice.
In computer science this problem is often reffered to as Collaborative Filtering There are two variants of this algorithms namely Item Based and User Based. For this project we’ve implemented User Based. The core idea of Collaborative Filtering is to
Capture preference of the user
Find set of users that have similar preference to the requestor
Recommend items based on preference of similar users
At high level it be expressed as:
from collections import defaultdict
# this function is responsible for generating recommendations
def get_suggestions(current_user_preference, similar_users):
# this dictionary will have lookup of item to score
global all_preferences
results = defaultdict(float)
for candidate_user in similar_users:
for candidate_item in all_preferences[candidate_user]:
# make sure we're not suggesting item that has
# already been rated by current user
if candidate_item not in preference:
# what is similarity between current user and candidate
similarity_score = similar_users[candidate_user]
candidate_pref_score = all_preferences[candidate_user][candidate_item]
results[candidate_item] += similarity_score * candidate_pref_score
return results
if __name__ == "__main__":
preference = capture_user_preference()
similar_users = get_similar_users(preference, all_preferences)
suggestions = get_suggestions(preferences, similar_users)
sorted_suggestions = reverse_sort_by_score(suggestions, k=10)
Few things to note in the code listing above:
capture_user_preference: function is the one that gets some baseline information from current user
get_similar_users: is used to get list of similar users sorted by similarity score. Similarity score can be Jaccard score
all_preferences: has ratings from reference users whom we’re using to profile current user
sorted_suggestions: will have at most k suggestions sorted by score
Recruiting¶
Finding the right candidate is often difficult, specially if you’re a HR professional dealing with stream of resumes on daily basis. One of our Clients asked us to develop a technical solution for this problem. We build a location based mobile web application that automated matching and documented whole interaction {for algorithmic improvements}.
We had implemented an algorithm similar to one mentioned above with an extension to filter results by location proximity. Apache Solr’s Geo features were used, upon completion of the process the Interviewer can fill out an evaluation form which would get sent to recruiter instantly.
Outcome of this tool were screening time for resumes was reduced, allowing recruiters to focus on important tasks.
Legal¶
Patent litigations are often complex, require technical expertise to help bring litigations to a resolution. For one of our client we had requirement of matching technical experts to patent that were getting litigated. We had implemented Content Based Matching Engine that is also used in news domain to show you related content. The overall framework for this matching engine was as follows:
- Parse the Technical Expert’s resume if we can extract:
Historical Patent Cases that they were part of
Patents that might have been granted
Technical Publications that they might have publishes
Build feature vector representation of experts resume
Build feature vector representation of cases {as collection of Patents in context}
Compute Similarity score over pairs of Resumes and Cases
Sort and Return Results
Other Variations¶
There are other variations of Matching algorithms that are used in our daily lives:
Microsoft’s TrueSkill