Useful Recipes ============== Following are some collection of python snippets, that we've discovered along the way RegEx ----- Places where regex are used extensively 1. Validation Rules {E.g. Email, Phone, Addresses} 2. Scraping {E.g. Extracting Prices, Text Snippets} 3. Translation {E.g. Replacing all upper cases to lower case} 4. Parsing Logs {E.g. Parsing Nginx and Apache logs} Things to keep in mind while working with RegEx 1. RegEx are `greedy` by default {That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient.} 2. When writing regex make sure you test for **positive** and **negative** test cases Handy resources: 1. `RegEx 101 `_ : Handy tool for interactively testing regex 2. `RegEx Cheatsheet `_ 3. `RegEx Howto `_ 4. `RegEx Golf `_ 5. `Using Regular Expressions in Python 3 `_ Grammar ``````` Characters 1. `\d` : Digits {0-9} 2. `\w` : Word character this includes Alphabets {Case Insensitive}, Digits and Underscore 3. `\s` : White spaces {including tabs, new line and character returns} Note: Capital of above character match inverses. E.g. '\D' will match anything that is not a digit Quantifiers 1. `.` : Matches occurance of any character 2. `?` : Optional Match 3. `+` : Match atleast one 4. `{x}` : Matches occurances of expression Exactly x number of times 5. `{x, y}` : Matches occurances of expression Range x, y number of times Logic 1. `|` : Matches sub-expressions within the group 2. `(...)` : Encapsulate sub-expressions in a group 3. `^` : If used at starting of an expression it means "at the start" 4. `^` : If used in a group it negates Class Characters 1. `[]`: Matches any of the character 2. `[a-z]`: Ranges of characters between a and z findall ``````` `Official re module doc `_, also contains good number of examples .. code-block:: python import re text_snippet = "there was a PEACH who PINCH, in return punch were flying around" # re.compile compiles regex into an objects # this makes it easier to work with regex # re.IGNORECASE is a flag, you can have multiple such flags pch_regex = re.compile(r"p.{1,3}ch", re.IGNORECASE) for current_match in pch_regex.findall(text_snippet): print (current_match) search `````` .. code-block:: python import re def validate_email(current_email): """ check if email is valid """ email_re = re.compile(r"\w+\@\w+\.(com|co\.in)", re.IGNORECASE) # .search() method is used to TEST if regex matches at all return email_re.search(current_email) is not None print validate_email("spammy@gmail.com") print validate_email("spammy@.co") finditer ```````` .. code-block:: python import re text_snippet = "there was a PEACH who PINCH, in return punch were flying around" pch_regex = re.compile(r"p.{1,3}ch", re.IGNORECASE) for current_match in pch_regex.finditer(text_snippet): print "Starts at:%d, Ends at:%d" % (current_match.start(), current_match.end()) sub ``` .. code-block:: python import re text_snippet = "there was a PEACH who PINCH, in return punch were flying around" pch_regex = re.compile(r"p.{1,3}ch", re.IGNORECASE) text_snippet_translated = re.sub(pch_regex, "_", text_snippet) print text_snippet_translated List ----- Sorting nested list by length ````````````````````````````` .. code-block:: python >>> x = [[1], [1,2,3,4,5], [1,2,3]] >>> sorted(x, key=len, reverse=True) [[1, 2, 3, 4, 5], [1, 2, 3], [1]] Collections ----------- Frequencies using Counter ````````````````````````` .. code-block:: python from collections import Counter x = [1, 2, 3, 4, 5, 6, 7, 1, 2, 1, 2, 1] Counter(x).most_common(3) Dictionaries ------------ Sorting by Value ```````````````` You may want to iterate on dictionary sorted by value, this can be achieved using `sorted` function .. code-block:: python import operator sorted_d = sorted(d.items(), key=operator.itemgetter(1)) Reference: `How to sort a dictionary by values in Python `_ Default Dictionaries ```````````````````` Default dictionary allows you to perform operations, without having to check for membership .. code-block:: python from collections import defaultdict counts = defaultdict(int) counts['foo'] += 1 Dictionary Comprehension ```````````````````````` .. code-block:: python d = {n: n**2 for n in range(5)} Object Oriented Programming --------------------------- StateMachine ```````````` The concept of `states `_ is central to computer science. At time we'd want to implement state machine in object oriented domain. Following is how to do it .. code-block:: python class State: def __init__(self): pass def run(self): assert 0, "run not implemented" def next(self, input): assert 0, "next not implemented" Things to take a note: 1. `__init__`, initializer can be used to set initial state of state machine 2. `next` method takes input and decides if the state changes or it remains in current state. Validation rules can also be implemented part of this method 3. `run()` method is used to execute the state Reference: `StateMachine `_ Inversion of Control ```````````````````` Associated with concept of `IoC` is that of `Coupling `_ . Think of it as a kind of `Vendor lock-in`. Concretely speaking lets say 1. You create a base class of `Engine` 2. When inheriting this class for `DieselEngine`, `GasolineEngine` and `ElectroEngine` we're creating a coupling {between these classes and base class `Engine` Following is what inversion of control would look like .. code-block:: python class Car(object): """Example car.""" def __init__(self, engine): """Initializer.""" self._engine = engine # Engine is injected Here we're passing instance of engine in initializer {as opposed to instantiating it} Reference: `Dependency injection and inversion of control in Python `_ Facade Pattern `````````````` Facade allows you to consolidate functionality. It allows you to unify functionality of lot of different objects or serives into one simple API. .. code-block:: python class Car(object): def __init__(self): self._tyres = [Tyre('front_left'), Tyre('front_right'), Tyre('rear_left'), Tyre('rear_right'), ] self._tank = Tank(70) def tyres_pressure(self): return [tyre.pressure for tyre in self._tyres] def fuel_level(self): return self._tank.level This of them as `Aggregator`. Related to them are `Proxy `_ and `Adapter `_ Adapter ``````` Adapter is about altering interfaces. It allows you to wrap an object/class to implement methods you're expecting. E.g. you've written a logger which has `destination` as parameter. And it expects logger to have `write` method, but `socket` doesn't have `write` method. You can write adapter for it as belows .. code-block:: python import socket class SocketWriter(object): def __init__(self, ip, port): self._socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) self._ip = ip self._port = port def write(self, message): self._socket.send(message, (self._ip, self._port)) def log(message, destination): destination.write('[{}] - {}'.format(datetime.now(), message)) upd_logger = SocketWriter('1.2.3.4', '9999') log('Something happened', udp_destination) Reference: `Design Patterns in Python `_ Singleton ````````` Singletons are used when you only want to have single instance of an object. They are usually useful for configuration and logging aspects of application. .. code-block:: python class Logger(object): def __new__(cls, *args, **kwargs): if not hasattr(cls, '_logger'): cls._logger = super(Logger, cls ).__new__(cls, *args, **kwargs) return cls._logger Avoid `too many` uses of `Singleton` Mixins `````` `Composition over Inheritance `_ has been a key concept in modern programming. Mixins allow you achieve composition, think of them as Interfaces of Java {or Protocols of Swift}. .. code-block:: python class BaseClass(object): pass class Mixin1(object): def test(self): print "Mixin1" class Mixin2(object): def test(self): print "Mixin2" class MyClass(Mixin2, Mixin1, BaseClass): pass x = MyClass() x.test() Things to note: 1. `BaseClass` is on the right as opposed to left, this is because of `Method Resolution Order `_ 2. The output of above code when executed will be `Mixin2` because of MRO Reference: `Mixins and Python `_ Plugin Architecture ``````````````````` When you're dealing with creation of `Pipelines`, you're generally thinking of creating a `Plugin Architecture`. With plugins system you generally do two things {simplistically speaking} 1. Creating a plugin {ie. registering plugin} 2. Executing plugin .. code-block:: python class TextProcessor(object): PLUGINS = [] def process(self, text, plugins=()): if plugins is (): for plugin in self.PLUGINS: text = plugin().process(text) else: for plugin in plugins: text = plugin().process(text) return text @classmethod def plugin(cls, plugin): cls.PLUGINS.append(plugin) return plugin @TextProcessor.plugin class CleanMarkdownBolds(object): def process(self, text): return text.replace('**', '') Which can be use as follows .. code-block:: python processor = TextProcessor() processed = processor.process(text="**foo bar**", plugins=(CleanMarkdownBolds, )) processed = processor.process(text="**foo bar**") On related note from `Plugins : Adding Flexibility to Your Apps `_ include: 1. Use of decorators {E.g. click's command decorator} 2. Use of `getattrs `_ to call functions dynamically 3. Creation of plugin architecture using simple functions and decorators {E.g. creation of `@register` decorator} Reference: 1. `Building a minimal plugin architecture in Python `_ 2. `Observer in Python `_ ECS {Entity Component System} ````````````````````````````` 1. ECS is way of organizing data {typically used in game and simulations} 2. You have `Space` {e.g. world} you want to populate with `Things` 3. `Things` can have some common feature or not 4. Object Oriented solution is to have `Thing` base class 5. Limitation of Object Oriented {only inherit from one level above} 6. Workaround using `Interfaces` is fine, but has it own limitations 7. E.g. Platypus {is Mamal which lays Eggs} 8. In ECS: 1. Entity: is base class {`Thing`} on which everything is based. Implementations typically use structs, classes, or associative arrays. 2. Component: Instead of inheritance add new feature. Implementations typically use structs, classes, or associative arrays. It will add `Component`. E.g. 1. `Has fur` 2. `Lay eggs` 3. System: Each System runs continuously and performs global actions on every Entity that possesses a Component of the same aspect as that System. Implementations typically use Threads Example from wikipedia Suppose there is a drawing function. This would be a "System" that iterates through all entities that have both a physical and a visible component, and draws them. The visible component could typically have some information about how an entity should look (e.g. human, monster, sparks flying around, flying arrow), and use the physical component to know where to draw it. Another system could be collision detection. It would iterate through all entities that have a physical component, as it would not care how the entity is drawn. This system would then, for instance, detect arrows that collide with monsters, and generate an event when that happens. It should not need to understand what an arrow is, and what it means when another object is hit by an arrow. Yet another component could be health data, and a system that manages health. Health components would be attached to the human and monster entities, but not to arrow entities. The health management system would subscribe to the event generated from collisions and update health accordingly. This system could also now and then iterate through all entities with the health component, and regenerate health. Reference: 1. `Entity Component System `_ 2. `Entity Component System Overview in 7 Minutes `_ REA {Resources Entity Action} ````````````````````````````` `REA Pattern `_ provides nicer abstraction for RTS games {E.g. Real Time Strategy game} can also be extended to simulations. Key aspects to RTS games include: 1. Units {E.g. Workers, Armies} 2. Resources {E.g. Gas, Mineral} 3. Buildings {E.g. Barracks, Robotic Facilities} 4. Battle Stats {E.g. APM etc} Quoting the paper 1. Resources: numerical values in the battle and economic system of the game. In this group we find the attack, defense, and life patterns of entities. Resources also cover building materials and costs of production, deployment of units, development of new weapons, etc. (Resources are scalars.) 2. Entities: container for resources. They have physical properties and, as for the game logic, the difference among them is only the interactions. These interactions take place with resource exchanges through the actions. (Entities are vectors.) 3. Actions: resource flow among entities. Our model can be viewed as a directed weighted graph where the nodes are the entities, the weights are the amounts of exchanged resources, and the edges are the actions, that is, the elements which connect entities to one another. (Actions are transformation matrices.) Parsing ------- Extracting emails from PDF `````````````````````````` You might want to extract data from PDF files, following .. code-block:: python import os import re import tqdm import textract pattern = "|".join([keyword.strip() for keyword in open("keywords.txt").readlines()]) def has_matching_keyword(filename): """ this function is used to extract emails from PDF files """ results = [] try: text = textract.process(filename) except: return 0 return len(re.findall(pattern, text)) for current in tqdm.tqdm(os.listdir(".")): if current.find(".pdf") != -1: if has_matching_keyword(current) > 0: print "Possible match: %s" % current Things to note: 1. We're using `keywords.txt` to construct regex `pattern`, which is searched within file contents 2. `tqdm` package is used to show progress with regards to processing files Reference: `textract `_ Parsing using pyparsing ``````````````````````` `pyparsing `_ allows you to create grammars and implement parsers. Following are couple of examples .. code-block:: python from pyparsing import Word, alphas, OneOrMore, Literal, oneOf # define grammar greet = Word(alphas) + "," + Word(alphas) + "!" # input string hello = "Hello, World!" # parse input string print hello, "->", greet.parseString(hello) # define grammer for more complex case word = Word(alphas+"'.") salutation = OneOrMore(word) comma = Literal(",") greete = OneOrMore(word) endpunc = oneOf("? !") greeting = salutation + comma + greete + endpunc test_cases = ["Hello, Sidharth!", "Hello, Sidharth how is your day?"] print(map(greeting.parseString, test_cases)) Implement parsing of queries using a grammar .. code-block:: python from pyparsing import Word, alphas, oneOf color = oneOf("red blue") category = oneOf("shirts shoes") color_category = color.setResultsName("color") + category.setResultsName("category") category_color = category.setResultsName("category") + color.setResultsName("color") query = color_category | category_color print map(query.parseString, ["red shirts", "blue shoes", "shoes blue", "shirts red"]) Reference: `pyparsing quick reference: A Python text processing tool `_ Parsing Excel files using Pandas ```````````````````````````````` `Pandas `_ is a pretty powerful library in data-science scenario. Following is example of how we can process excel file sheet by sheet .. code-block:: python import pandas as pd xl = pd.ExcelFile("sample.xlsx") sheets = xl.sheet_names for current in sheets: df = df.append(pd.read_excel("sample.xlsx", current)) References: `Python pandas.ExcelFile() Examples `_ Generation ---------- Generating Excel with multiple sheets ````````````````````````````````````` `Xlwt `_ is a package that can be used to generate excel. Following is example from official docs .. code-block:: python import xlwt from datetime import datetime style0 = xlwt.easyxf('font: name Times New Roman, color-index red, bold on', num_format_str='#,##0.00') style1 = xlwt.easyxf(num_format_str='D-MMM-YY') wb = xlwt.Workbook() ws = wb.add_sheet('A Test Sheet') ws.write(0, 0, 1234.56, style0) ws.write(1, 0, datetime.now(), style1) ws.write(2, 0, 1) ws.write(2, 1, 1) ws.write(2, 2, xlwt.Formula("A3+B3")) wb.save('example.xls') Another example of writing nested list into excel .. code-block:: python import xlwt from datetime import datetime excel = xlwt.Workbook() def append_sheet(sheet_name, headers, results): """ this method is used to append sheet """ sheet = excel.add_sheet(sheet_name) # write headers for i, current in enumerate(headers): sheet.write(0, i, current) # write remaining rows for row_index in range(1, len(results) + 1): for column_index in range(0, len(results[0])): sheet.write(row_index, column_index, results[row_index-1][column_index]) report_name = "%s-MIS-Reports.xls" % datetime.now().date() # this is where headers and results are generated headers, results = gen_aggregate_activity_report() # generate the excel file append_sheet('4_Aggregate_Activity_Report', headers, results) excel.save(report_name) Generating CSV/TSV with Pandas `````````````````````````````` Assuming `df` is data-frame object of Pandas, following can be used to save data to CSV .. code-block:: python export_csv = df.to_csv('sample.csv', index=None, header=True) For `TSV` we need to do the following .. code-block:: python export_csv = df.to_csv('sample.tsv', index=None, header=True, delimiter='\t') Generating content with Jinja2 `````````````````````````````` `Jinja `_ is a powerful templating engine based on Django's templating. This can be used in file based content generation scenarios .. code-block:: python from jinja2 import Template with open('sample.tpl') as file_: template = Template(file_.read()) print (template.render(name='John')) `sample.tpl` would look something like .. code-block:: html Howdy {{name}}! SQLAlchemy ---------- SQL Alchemy is a an `ORM - Object Relation Mapping `_ which allows to associate Python classes to Databases. Following is gist of how it works 1. Classes are mapped to Tables 2. Instances are mapped to Rows 3. Attributes are mapped to Columns in Tables This is useful when working with database, it allows us to query databases without having to write queries. Connecting to DB ```````````````` .. code-block:: python from sqlalchemy import create_engine engine = create_engine('sqlite:///mydb.sqlite', echo=True) **Note**: 1. `echo` flag is used to set verbosity of SQLAlchemy, in production it must be set to False 2. Return value of `create_engine` is a engine instance, which is what is used to work with Databases 3. The engine that is created is not talking to DB yet, it will do so when engine is asked to perform some tasks Declare a Mapping ````````````````` When working with ORM, you need to define two things 1. Class that will represent code/objects that will be used in the application 2. Mapping of Classes to actual DB Tables In SQLAlchemy this is done using a single step using `Declarative `_. Using this system we need to inherit all Classes that we want to map to DB using Base. .. code-block:: python from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, Integer, String Base = declarative_base() class User(Base): __tablename__ = "users" id = Column(Integer, primary_key=True) name = Column(String) fullname = Column(String) password = Column(String) def __repr__(self): return "" % (self.name, self.fullname, self.password) Base.metadata.create_all(engine) **Note**: In code listing above, `Base.metadata.create_all(engine)` is used to actually create the DB and its associated Tables. Creating Session and updating DB ```````````````````````````````` .. code-block:: python from sqlalchemy.orm import sessionmaker Session = sessionmaker(bind=engine) session = Session() sidharth = User(name="iamsidd", fullname="Sidharth Shah", password="mylittlesecret") session.add(sidharth) session.commit() Querying ```````` .. code-block:: python for instance in session.query(User).order_by(User.id): print instance.name, instance.fullname Aliasing is a feature that will allow you to give friendly names to classes .. code-block:: python from sqlalchemy.orm import aliased user_alias = aliased(User, name='user_alias') for row in session.query(user_alias, user_alias.name).all(): print(row.user_alias) Sorting using `order_by` .. code-block:: python for u in session.query(User).order_by(User.id)[1:]: print u.name Filtering using `filter_by` .. code-block:: python for u in session.query(User).filter_by(fullname='Sidharth Shah'): print u.name Chaining operations {e.g. filtering and sorting} .. code-block:: python query = session.query(User).filter(User.name.like('%iam%')).order_by(User.id) query.all() Misc queries .. code-block:: python query.first() query.count() NoSQL ----- MongoDB model class ``````````````````` .. code-block:: python from pymongo import MongoClient DBNAME = 'ft_analytics' client = MongoClient() DB = client[DBNAME] class MongoDBModel: def encode_object_id_to_string(self, rec): rec["_id"] = str(rec["_id"]) return rec def add(self, rec): """ avoid adding duplicates while adding """ rec['ts'] = datetime.now() self.collection.insert_one(rec) def find(self, filter_criteria): return list(self.collection.find(filter_criteria)) def list_all(self): return map(self.encode_object_id_to_string, list(self.collection.find())) def count(self): return self.collection.find().count() class Event(MongoDBModel): def __init__(self): self.collection = DB['events'] def filtered_events(self, start, end): return self.collection.find({'ts': {'$gte': start, '$lte': end}}) def filtered_events_by_role(self, start, end, role): return self.collection.find({'ts': {'$gte': start, '$lte': end}, 'user_type': role}) Flask ----- Fake JSON responses ``````````````````` For testing frontend or mobile application, you might need to generate JSON responses. Following snippets will be of use in those situations .. code-block:: python from flask import Flask, jsonify app = Flask(__name__) @app.route('/') def fake_response(): return jsonify({'response': 'Hello, Mehta!'}) To run this server you can run following commands: .. code-block:: sh export FLASK_APP=fake-server.py flask run --host 0.0.0.0 Requests -------- `requests `_ is a great package making it easier and intutive to work with HTTP requests {its a subtitute to `urllib`}. This can be used in following scenarios 1. API Integration 2. Crawling HTML pages 3. Automated submission of form Quickstart `````````` .. code-block:: python >>> import requests >>> r = requests.get('https://api.github.com/events') >>> r = requests.post('https://httpbin.org/post', data = {'key':'value'}) >>> r = requests.put('https://httpbin.org/put', data = {'key':'value'}) >>> r = requests.delete('https://httpbin.org/delete') >>> r = requests.head('https://httpbin.org/get') >>> r = requests.options('https://httpbin.org/get') Params in GET/POST `````````````````` .. code-block:: python >>> payload = {'key1': 'value1', 'key2': 'value2'} >>> r = requests.get('https://httpbin.org/get', params=payload) POST Request ```````````` .. code-block:: python >>> payload = {"param": "a", "param_1": "b"} >>> response = requests.post(URL, data=payload) Custom Headers `````````````` .. code-block:: python >>> url = 'https://api.github.com/some/endpoint' >>> headers = {'user-agent': 'my-app/0.0.1'} >>> r = requests.get(url, headers=headers)