Matching Engine =============== Introduction ```````````` Tinder, Google AdWords and Pandora are some examples of businesses whose core is based on matching engine. Matching elements from one set to another has lot of business value, lets consider a few of them: #. Matching resources to jobs #. Matching a product to a potential customer #. Matching customer service representative to a complain customer might have In this article we explore different matching problems, how to abstract them into a form that modern algorithms can solve. Motivating Examples ``````````````````` There are three projects in distinct domains where we've built custom matching engine. Each of the domain had special domain requirements which will be covered in detail #. **Travel**: Matching destinations to traveller #. **Recruiting**: Matching job seeker with recruiting company based both on preference and location proximity #. **Legal**: Matching technical expert that an Attorney can hire for help with Patent Litigation they might be fighting Travel ``````` For a Mobile Application that our client was building they wanted to build a matching engine which will enable users to make informed decisions about their itinerary. Depending on a brief questionnaire users filled in, we build a profile of what profile category the user might fall into. Based on this we look at recommending activities they might do at a location of their choice. In computer science this problem is often reffered to as `Collaborative Filtering `_ There are two variants of this algorithms namely Item Based and User Based. For this project we've implemented User Based. The core idea of Collaborative Filtering is to #. Capture preference of the user #. Find set of users that have similar preference to the requestor #. Recommend items based on preference of similar users At high level it be expressed as: .. code-block:: python from collections import defaultdict # this function is responsible for generating recommendations def get_suggestions(current_user_preference, similar_users): # this dictionary will have lookup of item to score global all_preferences results = defaultdict(float) for candidate_user in similar_users: for candidate_item in all_preferences[candidate_user]: # make sure we're not suggesting item that has # already been rated by current user if candidate_item not in preference: # what is similarity between current user and candidate similarity_score = similar_users[candidate_user] candidate_pref_score = all_preferences[candidate_user][candidate_item] results[candidate_item] += similarity_score * candidate_pref_score return results if __name__ == "__main__": preference = capture_user_preference() similar_users = get_similar_users(preference, all_preferences) suggestions = get_suggestions(preferences, similar_users) sorted_suggestions = reverse_sort_by_score(suggestions, k=10) Few things to note in the code listing above: #. *capture_user_preference*: function is the one that gets some baseline information from current user #. *get_similar_users*: is used to get list of similar users sorted by similarity score. Similarity score can be `Jaccard score `_ #. *all_preferences*: has ratings from reference users whom we're using to profile current user #. *sorted_suggestions*: will have at most **k** suggestions sorted by score Recruiting `````````` Finding the right candidate is often difficult, specially if you're a HR professional dealing with stream of resumes on daily basis. One of our Clients asked us to develop a technical solution for this problem. We build a location based mobile web application that automated matching and documented whole interaction {for algorithmic improvements}. We had implemented an algorithm similar to one mentioned above with an extension to filter results by location proximity. Apache Solr's Geo features were used, upon completion of the process the Interviewer can fill out an evaluation form which would get sent to recruiter instantly. Outcome of this tool were screening time for resumes was reduced, allowing recruiters to focus on important tasks. .. image:: images/xc.png Legal ````` Patent litigations are often complex, require technical expertise to help bring litigations to a resolution. For one of our client we had requirement of matching technical experts to patent that were getting litigated. We had implemented Content Based Matching Engine that is also used in news domain to show you related content. The overall framework for this matching engine was as follows: #. Parse the Technical Expert's resume if we can extract: #. Historical Patent Cases that they were part of #. Patents that might have been granted #. Technical Publications that they might have publishes #. Build feature vector representation of experts resume #. Build feature vector representation of cases {as collection of Patents in context} #. Compute Similarity score over pairs of Resumes and Cases #. Sort and Return Results Other Variations ```````````````` There are other variations of Matching algorithms that are used in our daily lives: #. Microsoft's `TrueSkill `_ #. `NRMP Algorithm `_