Matching Engine
===============

Introduction
````````````
Tinder, Google AdWords and Pandora are some examples of businesses whose core
is based on matching engine. Matching elements from one set to another has lot
of business value, lets consider a few of them:

#. Matching resources to jobs
#. Matching a product to a potential customer
#. Matching customer service representative to a complain customer might have

In this article we explore different matching problems, how to abstract them
into a form that modern algorithms can solve.

Motivating Examples
```````````````````

There are three projects in distinct domains where we've built custom
matching engine. Each of the domain had special domain requirements which will
be covered in detail

#. **Travel**: Matching destinations to traveller
#. **Recruiting**: Matching job seeker with recruiting company based 
   both on preference and location proximity
#. **Legal**: Matching technical expert that an Attorney can hire for help
   with Patent Litigation they might be fighting

Travel 
```````
For a Mobile Application that our client was building they wanted to
build a matching engine which will enable users to make informed
decisions about their itinerary. 

Depending on a brief questionnaire users filled in, we build a profile
of what profile category the user might fall into. Based on this
we look at recommending activities they might do at a location of their choice.

In computer science this problem is often reffered to as `Collaborative Filtering <https://en.wikipedia.org/wiki/Collaborative_filtering>`_
There are two variants of this algorithms namely Item Based and User Based.
For this project we've implemented User Based. The core idea of Collaborative
Filtering is to

#. Capture preference of the user
#. Find set of users that have similar preference to the requestor
#. Recommend items based on preference of similar users

At high level it be expressed as:

.. code-block:: python

   from collections import defaultdict

   # this function is responsible for generating recommendations
   def get_suggestions(current_user_preference, similar_users):
       # this dictionary will have lookup of item to score
       global all_preferences

       results = defaultdict(float)
       for candidate_user  in similar_users:
           for candidate_item in all_preferences[candidate_user]:
               # make sure we're not suggesting item that has
               # already been rated by current user
               if candidate_item not in preference:
                   # what is similarity between current user and candidate
                   similarity_score = similar_users[candidate_user]
                   candidate_pref_score = all_preferences[candidate_user][candidate_item]
                   results[candidate_item] += similarity_score * candidate_pref_score
       return results

   if __name__ == "__main__":
       preference = capture_user_preference()
       similar_users = get_similar_users(preference, all_preferences)
       suggestions = get_suggestions(preferences, similar_users)
       sorted_suggestions = reverse_sort_by_score(suggestions, k=10)

Few things to note in the code listing above:

#. *capture_user_preference*: function is the one that gets some baseline
   information from current user
#. *get_similar_users*: is used to get list of similar users sorted by similarity score.
   Similarity score can be `Jaccard score <https://en.wikipedia.org/wiki/Jaccard_index>`_
#. *all_preferences*: has ratings from reference users whom we're using to profile current user
#. *sorted_suggestions*: will have at most **k** suggestions sorted by score

Recruiting
``````````

Finding the right candidate is often difficult, specially if you're a
HR professional dealing with stream of resumes on daily basis. One of 
our Clients asked us to develop a technical solution for this problem.
We build a location based mobile web application that automated matching and
documented whole interaction {for algorithmic improvements}.

We had implemented an algorithm similar to one mentioned above with an extension
to filter results by location proximity. Apache Solr's Geo features were used,
upon completion of the process the Interviewer can fill out an evaluation form
which would get sent to recruiter instantly.

Outcome of this tool were screening time for resumes was reduced, allowing
recruiters to focus on important tasks. 

.. image:: images/xc.png

Legal
`````
Patent litigations are often complex, require technical expertise to help
bring litigations to a resolution. For one of our client we had requirement
of matching technical experts to patent that were getting litigated. 
We had implemented Content Based Matching Engine that is also used in news
domain to show you related content. The overall framework for this matching
engine was as follows:

#. Parse the Technical Expert's resume if we can extract:
    #. Historical Patent Cases that they were part of
    #. Patents that might have been granted
    #. Technical Publications that they might have publishes
#. Build feature vector representation of experts resume
#. Build feature vector representation of cases {as collection of Patents in context}
#. Compute Similarity score over pairs of Resumes and Cases
#. Sort and Return Results

Other Variations
````````````````

There are other variations of Matching algorithms that are used in our daily lives:

#. Microsoft's `TrueSkill <https://www.microsoft.com/en-us/research/project/trueskill-ranking-system/>`_
#. `NRMP Algorithm <https://www.youtube.com/watch?v=kvgfgGmemdA>`_