Monday, July 10, 2023

When you need to open 3 (or more) files in Python3

The problem is I needed to open 3 files:

1. CSV source file to read
2. CSV file to write results on after processing the first file
3. Another CSV file that has metadata resulting from processing the 1st CSV that's different from the 2nd

Solution

There are 2 practical ways to do it. There's a 3rd but it has a very specific use case.

As for Python3, you can basically do this using context managers via the 'with' keyword:


with open('a', 'r') as a_file, open('b', 'w') as b_file, open('c', 'wb') as c_file:    
    do_something()

I believe it this also exist in Python 2.7 but you might need to use the contextlib.nested() function

The other approach is when you need to process files sequentially rather than opening all of them at the same time. Useful if you have variable number of files:

for fname in filenames:
    with open(fname) as f:
        # Process f

You can also use this if you would rather keep the results in memory and only write it to the file when done. 

Reference: 

  • https://stackoverflow.com/questions/4617034/how-can-i-open-multiple-files-using-with-open-in-python

Thursday, March 30, 2023

Notes on Solving Simple Vehicle Routing with Python and Google Maps only

Simple Vehicle Routing is a problem related to finding the "optimum" path between a set of locations. The rub here is dealing with real-world addresses and road data (ie. traffic, condition, etc.). Google documentation (or propaganda) will say this is easy but that's slightly misleading.

What I've learned:

 1. Setting up to use the Google Maps APIs (yes APIs; we are going to use most of the it) needs a billing account or credit card. Although, it will not cost you anything because Google is giving you usd$200 month credit. So unless, you exceed 200 bucks worth of usage, it won't cost you anything BUT it's a good idea to set a limit just in case.

2. Jupyter Notebooks is handy to figure out and test stuff. It's just a little finicky getting it to run. For example:

  • I had to rollback ipywidgets to 7.7.2 from the latest 8.0.9 to get gmaps to work
  • I had to change python versions from 3.11 to 3.9 because scipy was being a bitch when installing ortools. Ortools has dependencys to scipy. Luckily I already use pyenv so this was relatively painless.
Jupyter Notebook running with gmaps, OR-Tools


Next step is figuring out how get what I've learned into a REST API for delivery/pickup app. I thinking, use Starlite + Rethinkdb.

Ref: 
  • https://woolpert.com/managing-simple-vrp-with-google-maps-platform/
  • https://developers.google.com/optimization/routing/vrp
  • https://jupyter-gmaps.readthedocs.io/en/latest/install.html
  • https://stackoverflow.com/questions/72371859/attributeerror-module-collections-has-no-attribute-iterable

Monday, February 13, 2023

Notes on Different Docker image tags

Full Images

These are images based on the most latest release of an operating system. For example, if you talking about the `python:latest` then this would be the most recent stable Debian OS release. These images are usually the one's we use at the start of a project.

The catch with these images is that they are usually pretty big. The latest python image clocks at around 4GB. If image size is not a concern then using these full images in production poses little risk.

Slim

Denoted by `-slim` in the tags are paired down images of the full image. These contain the barest minimum of packages to run your particular tool - ie. Python, nodejs, etc. 

The catch with these is that a lot of stuff, you might use will be not available and you need to test when using these type of images. You'll often stumble into errors like missing dependencies or libraries when using these.

Alpine

Unlike the full images that are based on Debian, alpine tag images are based on the Alpine Linux Project, which is an operating system made for containers. Using these images usually results in the smallest possible image.

The catch is that Alpine uses different stuff compared to Debian. For example, musl instead of glibc because of the size. That might break your app if it has certain requirements. 

Other tags like stretch/buster/jessie/bullseye

These images denote that they are using a certain version of Debian. For example, if the image is tagged buster then that image is using Debian 10.4, if jessie then that's Debian 8.

What to use then?

1. If for local development, use the full image. 

2. It also ok to use the full image for production if size is not an issue. Just don't use the latest tag because this always pulls the latest image and it might break your app's dependencies. 

3. Use slim instead of alpine. Alpine is cool and all but it's a pain to debug.