Useful Recipes

Following are some collection of python snippets, that we’ve discovered along the way

RegEx

Places where regex are used extensively

  1. Validation Rules {E.g. Email, Phone, Addresses}
  2. Scraping {E.g. Extracting Prices, Text Snippets}
  3. Translation {E.g. Replacing all upper cases to lower case}
  4. Parsing Logs {E.g. Parsing Nginx and Apache logs}

Things to keep in mind while working with RegEx

  1. RegEx are greedy by default {That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient.}
  2. When writing regex make sure you test for positive and negative test cases

Handy resources:

  1. RegEx 101 : Handy tool for interactively testing regex
  2. RegEx Cheatsheet
  3. RegEx Howto
  4. RegEx Golf
  5. Using Regular Expressions in Python 3

Grammar

Characters

  1. d : Digits {0-9}
  2. w : Word character this includes Alphabets {Case Insensitive}, Digits and Underscore
  3. s : White spaces {including tabs, new line and character returns}

Note: Capital of above character match inverses. E.g. ‘D’ will match anything that is not a digit

Quantifiers

  1. . : Matches occurance of any character
  2. ? : Optional Match
  3. + : Match atleast one
  4. {x} : Matches occurances of expression Exactly x number of times
  5. {x, y} : Matches occurances of expression Range x, y number of times

Logic

  1. | : Matches sub-expressions within the group
  2. (…) : Encapsulate sub-expressions in a group
  3. ^ : If used at starting of an expression it means “at the start”
  4. ^ : If used in a group it negates

Class Characters

  1. []: Matches any of the character
  2. [a-z]: Ranges of characters between a and z

findall

Official re module doc, also contains good number of examples

import re
text_snippet = "there was a PEACH who PINCH, in return punch were flying around"
# re.compile compiles regex into an objects
# this makes it easier to work with regex
# re.IGNORECASE is a flag, you can have multiple such flags
pch_regex = re.compile(r"p.{1,3}ch", re.IGNORECASE)
for current_match in pch_regex.findall(text_snippet):
    print (current_match)

finditer

import re
text_snippet = "there was a PEACH who PINCH, in return punch were flying around"
pch_regex = re.compile(r"p.{1,3}ch", re.IGNORECASE)
for current_match in pch_regex.finditer(text_snippet):
    print "Starts at:%d, Ends at:%d" % (current_match.start(), current_match.end())

sub

import re
text_snippet = "there was a PEACH who PINCH, in return punch were flying around"
pch_regex = re.compile(r"p.{1,3}ch", re.IGNORECASE)
text_snippet_translated = re.sub(pch_regex, "_", text_snippet)
print text_snippet_translated

List

Sorting nested list by length

>>> x = [[1], [1,2,3,4,5], [1,2,3]]
>>> sorted(x, key=len, reverse=True)
[[1, 2, 3, 4, 5], [1, 2, 3], [1]]

Collections

Frequencies using Counter

from collections import Counter
x = [1, 2, 3, 4, 5, 6, 7, 1, 2, 1, 2, 1]

Counter(x).most_common(3)

Dictionaries

Sorting by Value

You may want to iterate on dictionary sorted by value, this can be achieved using sorted function

import operator
sorted_d = sorted(d.items(), key=operator.itemgetter(1))

Reference: How to sort a dictionary by values in Python

Default Dictionaries

Default dictionary allows you to perform operations, without having to check for membership

from collections import defaultdict
counts = defaultdict(int)
counts['foo'] += 1

Dictionary Comprehension

d = {n: n**2 for n in range(5)}

Object Oriented Programming

StateMachine

The concept of states is central to computer science. At time we’d want to implement state machine in object oriented domain. Following is how to do it

class State:
    def __init__(self):
        pass
    def run(self):
        assert 0, "run not implemented"
    def next(self, input):
        assert 0, "next not implemented"

Things to take a note:

  1. __init__, initializer can be used to set initial state of state machine
  2. next method takes input and decides if the state changes or it remains in current state. Validation rules can also be implemented part of this method
  3. run() method is used to execute the state

Reference: StateMachine

Inversion of Control

Associated with concept of IoC is that of Coupling . Think of it as a kind of Vendor lock-in.

Concretely speaking lets say

  1. You create a base class of Engine
  2. When inheriting this class for DieselEngine, GasolineEngine and ElectroEngine we’re creating a coupling {between these classes and base class Engine

Following is what inversion of control would look like

class Car(object):
    """Example car."""

    def __init__(self, engine):
        """Initializer."""
        self._engine = engine  # Engine is injected

Here we’re passing instance of engine in initializer {as opposed to instantiating it}

Reference: Dependency injection and inversion of control in Python

Facade Pattern

Facade allows you to consolidate functionality. It allows you to unify functionality of lot of different objects or serives into one simple API.

class Car(object):

    def __init__(self):
        self._tyres = [Tyre('front_left'),
                            Tyre('front_right'),
                            Tyre('rear_left'),
                            Tyre('rear_right'), ]
        self._tank = Tank(70)

    def tyres_pressure(self):
        return [tyre.pressure for tyre in self._tyres]

    def fuel_level(self):
        return self._tank.level

This of them as Aggregator. Related to them are Proxy and Adapter

Adapter

Adapter is about altering interfaces. It allows you to wrap an object/class to implement methods you’re expecting. E.g. you’ve written a logger which has destination as parameter. And it expects logger to have write method, but socket doesn’t have write method. You can write adapter for it as belows

import socket

class SocketWriter(object):

    def __init__(self, ip, port):
        self._socket = socket.socket(socket.AF_INET,
                                    socket.SOCK_DGRAM)
        self._ip = ip
        self._port = port

    def write(self, message):
        self._socket.send(message, (self._ip, self._port))

def log(message, destination):
    destination.write('[{}] - {}'.format(datetime.now(), message))

upd_logger = SocketWriter('1.2.3.4', '9999')
log('Something happened', udp_destination)

Reference: Design Patterns in Python

Singleton

Singletons are used when you only want to have single instance of an object. They are usually useful for configuration and logging aspects of application.

class Logger(object):
    def __new__(cls, *args, **kwargs):
        if not hasattr(cls, '_logger'):
            cls._logger = super(Logger, cls
                    ).__new__(cls, *args, **kwargs)
        return cls._logger

Avoid too many uses of Singleton

Mixins

Composition over Inheritance has been a key concept in modern programming. Mixins allow you achieve composition, think of them as Interfaces of Java {or Protocols of Swift}.

class BaseClass(object):
    pass

class Mixin1(object):
    def test(self):
        print "Mixin1"

class Mixin2(object):
    def test(self):
        print "Mixin2"

class MyClass(Mixin2, Mixin1, BaseClass):
    pass

x = MyClass()
x.test()

Things to note:

  1. BaseClass is on the right as opposed to left, this is because of Method Resolution Order
  2. The output of above code when executed will be Mixin2 because of MRO

Reference: Mixins and Python

Plugin Architecture

When you’re dealing with creation of Pipelines, you’re generally thinking of creating a Plugin Architecture. With plugins system you generally do two things {simplistically speaking}

  1. Creating a plugin {ie. registering plugin}
  2. Executing plugin
class TextProcessor(object):
    PLUGINS = []

    def process(self, text, plugins=()):
        if plugins is ():
            for plugin in self.PLUGINS:
                text = plugin().process(text)
        else:
            for plugin in plugins:
                text = plugin().process(text)
        return text

    @classmethod
    def plugin(cls, plugin):
        cls.PLUGINS.append(plugin)
        return plugin


@TextProcessor.plugin
class CleanMarkdownBolds(object):
    def process(self, text):
        return text.replace('**', '')

Which can be use as follows

processor = TextProcessor()
processed = processor.process(text="**foo bar**", plugins=(CleanMarkdownBolds, ))
processed = processor.process(text="**foo bar**")

On related note from Plugins : Adding Flexibility to Your Apps include:

  1. Use of decorators {E.g. click’s command decorator}
  2. Use of getattrs to call functions dynamically
  3. Creation of plugin architecture using simple functions and decorators {E.g. creation of @register decorator}

Reference:

  1. Building a minimal plugin architecture in Python
  2. Observer in Python

ECS {Entity Component System}

  1. ECS is way of organizing data {typically used in game and simulations}
  2. You have Space {e.g. world} you want to populate with Things
  3. Things can have some common feature or not
  4. Object Oriented solution is to have Thing base class
  5. Limitation of Object Oriented {only inherit from one level above}
  6. Workaround using Interfaces is fine, but has it own limitations
  7. E.g. Platypus {is Mamal which lays Eggs}
  8. In ECS:
    1. Entity: is base class {Thing} on which everything is based. Implementations typically use structs, classes, or associative arrays.
    2. Component: Instead of inheritance add new feature. Implementations typically use structs, classes, or associative arrays. It will add Component. E.g.
      1. Has fur
      2. Lay eggs
    3. System: Each System runs continuously and performs global actions on every Entity that possesses a Component of the same aspect as that System. Implementations typically use Threads

Example from wikipedia

Suppose there is a drawing function. This would be a “System” that iterates through all entities that have both a physical and a visible component, and draws them. The visible component could typically have some information about how an entity should look (e.g. human, monster, sparks flying around, flying arrow), and use the physical component to know where to draw it. Another system could be collision detection. It would iterate through all entities that have a physical component, as it would not care how the entity is drawn. This system would then, for instance, detect arrows that collide with monsters, and generate an event when that happens. It should not need to understand what an arrow is, and what it means when another object is hit by an arrow. Yet another component could be health data, and a system that manages health. Health components would be attached to the human and monster entities, but not to arrow entities. The health management system would subscribe to the event generated from collisions and update health accordingly. This system could also now and then iterate through all entities with the health component, and regenerate health.

Reference:

  1. Entity Component System
  2. Entity Component System Overview in 7 Minutes

REA {Resources Entity Action}

REA Pattern provides nicer abstraction for RTS games {E.g. Real Time Strategy game} can also be extended to simulations.

Key aspects to RTS games include:

  1. Units {E.g. Workers, Armies}
  2. Resources {E.g. Gas, Mineral}
  3. Buildings {E.g. Barracks, Robotic Facilities}
  4. Battle Stats {E.g. APM etc}

Quoting the paper

1. Resources: numerical values in the battle and economic system of the game. In this group we find the attack, defense, and life patterns of entities. Resources also cover building materials and costs of production, deployment of units, development of new weapons, etc. (Resources are scalars.)

2. Entities: container for resources. They have physical properties and, as for the game logic, the difference among them is only the interactions. These interactions take place with resource exchanges through the actions. (Entities are vectors.)

  1. Actions: resource flow among entities. Our model can be viewed as a directed weighted graph where the nodes are the entities, the weights are the amounts of exchanged resources, and the edges are the actions, that is, the elements which connect entities to one another. (Actions are transformation matrices.)

Parsing

Extracting emails from PDF

You might want to extract data from PDF files, following

import os
import re
import tqdm
import textract

pattern = "|".join([keyword.strip() for keyword in open("keywords.txt").readlines()])

def has_matching_keyword(filename):
    """
    this function is used to extract emails from PDF files
    """
    results = []
    try:
        text = textract.process(filename)
    except:
        return 0
    return len(re.findall(pattern, text))

for current in tqdm.tqdm(os.listdir(".")):
    if current.find(".pdf") != -1:
        if has_matching_keyword(current) > 0:
            print "Possible match: %s" % current

Things to note:

  1. We’re using keywords.txt to construct regex pattern, which is searched within file contents
  2. tqdm package is used to show progress with regards to processing files

Reference: textract

Parsing using pyparsing

pyparsing allows you to create grammars and implement parsers. Following are couple of examples

from pyparsing import Word, alphas, OneOrMore, Literal, oneOf

# define grammar
greet = Word(alphas) + "," + Word(alphas) + "!"

# input string
hello = "Hello, World!"

# parse input string
print hello, "->", greet.parseString(hello)


# define grammer for more complex case
word = Word(alphas+"'.")
salutation = OneOrMore(word)
comma = Literal(",")
greete = OneOrMore(word)
endpunc = oneOf("? !")
greeting = salutation + comma + greete + endpunc

test_cases = ["Hello, Sidharth!", "Hello, Sidharth how is your day?"]
print(map(greeting.parseString, test_cases))

Implement parsing of queries using a grammar

from pyparsing import Word, alphas, oneOf

color = oneOf("red blue")
category = oneOf("shirts shoes")
color_category = color.setResultsName("color") + category.setResultsName("category")
category_color = category.setResultsName("category") + color.setResultsName("color")
query = color_category | category_color

print map(query.parseString, ["red shirts", "blue shoes", "shoes blue", "shirts red"])

Reference: pyparsing quick reference: A Python text processing tool

Parsing Excel files using Pandas

Pandas is a pretty powerful library in data-science scenario. Following is example of how we can process excel file sheet by sheet

import pandas as pd

xl = pd.ExcelFile("sample.xlsx")
sheets = xl.sheet_names
for current in sheets:
    df = df.append(pd.read_excel("sample.xlsx", current))

References: Python pandas.ExcelFile() Examples

Generation

Generating Excel with multiple sheets

Xlwt is a package that can be used to generate excel. Following is example from official docs

import xlwt
from datetime import datetime

style0 = xlwt.easyxf('font: name Times New Roman, color-index red, bold on',
    num_format_str='#,##0.00')
style1 = xlwt.easyxf(num_format_str='D-MMM-YY')

wb = xlwt.Workbook()
ws = wb.add_sheet('A Test Sheet')

ws.write(0, 0, 1234.56, style0)
ws.write(1, 0, datetime.now(), style1)
ws.write(2, 0, 1)
ws.write(2, 1, 1)
ws.write(2, 2, xlwt.Formula("A3+B3"))

wb.save('example.xls')

Another example of writing nested list into excel

import xlwt
from datetime import datetime

excel = xlwt.Workbook()

def append_sheet(sheet_name, headers, results):
    """
    this method is used to append sheet
    """
    sheet = excel.add_sheet(sheet_name)

    # write headers
    for i, current in enumerate(headers):
        sheet.write(0, i, current)

    # write remaining rows
    for row_index in range(1, len(results) + 1):
        for column_index in range(0, len(results[0])):
            sheet.write(row_index, column_index, results[row_index-1][column_index])

report_name = "%s-MIS-Reports.xls" % datetime.now().date()

# this is where headers and results are generated
headers, results = gen_aggregate_activity_report()

# generate the excel file
append_sheet('4_Aggregate_Activity_Report', headers, results)
excel.save(report_name)

Generating CSV/TSV with Pandas

Assuming df is data-frame object of Pandas, following can be used to save data to CSV

export_csv = df.to_csv('sample.csv', index=None, header=True)

For TSV we need to do the following

export_csv = df.to_csv('sample.tsv', index=None, header=True, delimiter='\t')

Generating content with Jinja2

Jinja is a powerful templating engine based on Django’s templating. This can be used in file based content generation scenarios

from jinja2 import Template

with open('sample.tpl') as file_:
    template = Template(file_.read())

print (template.render(name='John'))

sample.tpl would look something like

Howdy {{name}}!

SQLAlchemy

SQL Alchemy is a an ORM - Object Relation Mapping which allows to associate Python classes to Databases. Following is gist of how it works

  1. Classes are mapped to Tables
  2. Instances are mapped to Rows
  3. Attributes are mapped to Columns in Tables

This is useful when working with database, it allows us to query databases without having to write queries.

Connecting to DB

from sqlalchemy import create_engine
engine = create_engine('sqlite:///mydb.sqlite', echo=True)

Note:

  1. echo flag is used to set verbosity of SQLAlchemy, in production it must be set to False
  2. Return value of create_engine is a engine instance, which is what is used to work with Databases
  3. The engine that is created is not talking to DB yet, it will do so when engine is asked to perform some tasks

Declare a Mapping

When working with ORM, you need to define two things

  1. Class that will represent code/objects that will be used in the application
  2. Mapping of Classes to actual DB Tables

In SQLAlchemy this is done using a single step using Declarative. Using this system we need to inherit all Classes that we want to map to DB using Base.

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String
Base = declarative_base()

class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True)
    name = Column(String)
    fullname = Column(String)
    password = Column(String)

    def __repr__(self):
        return "<User(name='%s', fullname='%s', password='%s')>" % (self.name, self.fullname, self.password)

Base.metadata.create_all(engine)

Note:

In code listing above, Base.metadata.create_all(engine) is used to actually create the DB and its associated Tables.

Creating Session and updating DB

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()
sidharth = User(name="iamsidd", fullname="Sidharth Shah", password="mylittlesecret")
session.add(sidharth)
session.commit()

Querying

for instance in session.query(User).order_by(User.id):
    print instance.name, instance.fullname

Aliasing is a feature that will allow you to give friendly names to classes

from sqlalchemy.orm import aliased
user_alias = aliased(User, name='user_alias')

for row in session.query(user_alias, user_alias.name).all():
    print(row.user_alias)

Sorting using order_by

for u in session.query(User).order_by(User.id)[1:]:
    print u.name

Filtering using filter_by

for u in session.query(User).filter_by(fullname='Sidharth Shah'):
    print u.name

Chaining operations {e.g. filtering and sorting}

query = session.query(User).filter(User.name.like('%iam%')).order_by(User.id)
query.all()

Misc queries

query.first()
query.count()

NoSQL

MongoDB model class

from pymongo import MongoClient

DBNAME = 'ft_analytics'
client = MongoClient()
DB = client[DBNAME]

class MongoDBModel:
    def encode_object_id_to_string(self, rec):
        rec["_id"] = str(rec["_id"])
        return rec

    def add(self, rec):
        """
        avoid adding duplicates while adding
        """
        rec['ts'] = datetime.now()
        self.collection.insert_one(rec)

    def find(self, filter_criteria):
        return list(self.collection.find(filter_criteria))

    def list_all(self):
        return map(self.encode_object_id_to_string, list(self.collection.find()))

    def count(self):
        return self.collection.find().count()

class Event(MongoDBModel):
    def __init__(self):
        self.collection = DB['events']

    def filtered_events(self, start, end):
        return self.collection.find({'ts': {'$gte': start, '$lte': end}})

    def filtered_events_by_role(self, start, end, role):
        return self.collection.find({'ts': {'$gte': start, '$lte': end}, 'user_type': role})

Flask

Fake JSON responses

For testing frontend or mobile application, you might need to generate JSON responses. Following snippets will be of use in those situations

from flask import Flask, jsonify
app = Flask(__name__)

@app.route('/')
def fake_response():
    return jsonify({'response': 'Hello, Mehta!'})

To run this server you can run following commands:

export FLASK_APP=fake-server.py
flask run --host 0.0.0.0

Requests

requests is a great package making it easier and intutive to work with HTTP requests {its a subtitute to urllib}. This can be used in following scenarios

  1. API Integration
  2. Crawling HTML pages
  3. Automated submission of form

Quickstart

>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r = requests.post('https://httpbin.org/post', data = {'key':'value'})
>>> r = requests.put('https://httpbin.org/put', data = {'key':'value'})
>>> r = requests.delete('https://httpbin.org/delete')
>>> r = requests.head('https://httpbin.org/get')
>>> r = requests.options('https://httpbin.org/get')

Params in GET/POST

>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get('https://httpbin.org/get', params=payload)

POST Request

>>> payload = {"param": "a", "param_1": "b"}
>>> response = requests.post(URL, data=payload)

Custom Headers

>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)