User:QEDK/GSoC 2020/Working with paths in Python and Rust, and a note on BLM

From mediawiki.org

A while ago, I was writing an integration test script for my Rust crates (configparser and ini) which involved loading an .ini file inside the same directory (i.e. inside the /tests directory). To show a quick picture of what I mean -

Medium

You'd guess that to access "test.ini", I would need to simply give the name of the file; after all, they are in the same directory. Not quite. When running integration tests, cargo runs your integration tests from the root context, I'd typically have guessed that it should involve bringing the entire "test" directory into scope, but unfortunately not.

So, the end result is something a bit different, not bad at all but gets annoying, even after putting the files into the same directory, you have to access it outside the scope. However, there's probably a reason behind this, let's talk about that. Here's an example of the correct path to load the file from-

use configparser::ini::Ini;
use std::error::Error;
#[(test)]
fn main() -> Result<(), Box<dyn Error>> {
  let mut config = Ini::new();
  let map = config.load("tests/test.ini")?;
}

I had an example code snippet in the documentation of the module (in "lib.rs"), so logically speaking if I was to load the "test.ini" file, what would the path look like? Probably making the access by changing to the parent directory, then going into "tests", then accessing "test.ini"? That's where the true logic of running tests from the crate root shines. Here's the actual doctest code -

use configparser::ini::Ini;
use std::error::Error;
fn main() -> Result<(), Box<dyn Error>> {
  let mut config = Ini::new();
  let map = config.load("tests/test.ini")?;
}

Spot a difference? There isn't any (except the #[(test)] directive, but that's only to tell cargo that this is a test). And that's why the path-agnostic approach is so great because it doesn't matter where the program is being run from, the location of the import just needs to be definite. You will notice that in both cases the paths are defined relatively.

But what if I had to access the parent directory and then access files nested inside? In Rust, I would do something like -

use std::fs::File;
fn main() -> std::io::Result<()> {
    let mut f = File::open("../somefile.rs")?;
    Ok(())
}

If I were to run the binary produced by rustc, the path-agnosticism is sadly lost - because it has to depend on the location of the binary; think of how we're letting it know about the location - go up one level -> open "somefile.rs". This behaviour has to remain consistent for all binaries, so quite rightly, it remains consistent.

But what if I had to get user input and I want to ensure that the path remains consistent, there are several design considerations, relative or absolute paths is the first. Typically, they are distinguished like:

  • When beginning with a forward slash (/), it is denoted to be an absolute path.
  • Otherwise, it's a relative path.

Now, what if you choose to use shell expansions or a mixture of absolute and relative paths, that's when the water starts to get muddy. In most cases, the approach remains the same, get the "correct" absolute path, but that's easier said than done.

Paths in Python[edit]

Now, about Python, depending on where I'm running the interpreter from, the location of a certain file will change, so I need to absolutely ensure that the position of a file remains consistent across relative or absolute usage, there are only two dynamics at play (mostly) -

  1. For user input, like configuration files, it's generally best to get the absolute path. In most cases, it won't be, so the responsibility somewhat lies with the developer to ensure the correct path.
  2. In case when a script is trying to get files related to its code, it should have a relative path. In this case, it's absolutely the job of the developer to ensure the correct path.

In case of Python, the standard library has a saving grace {{subst:emdash}} os.path.abspath. In fact, in actual code, all it would take is:

import configparser
config = configparser.ConfigParser()
config.read(os.path.abspath(os.path.expanduser("~/test.ini")))

Yes, that's including the shell-expansion built-in (just a note that you could do it probably as easy on Rust, although you'd need something like shellexpand).

Looks simple enough, but the true magic is what's inside path.abspath(). It can be treated equivalent to: path.normpath(join(os.getcwd(), path)). And what does normpath() exactly do? The documentation reads like:

Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. On Windows, it converts forward slashes to backward slashes.

So, a relative path given to abspath() will function normally, if you specified a path like, ../templates/src, it would join the path with the current working directory, and even if you're running it from a different directory, it will be collapsed down to the appropriate location, similarly whether you use expansions like ~ or $HOME, expanduser() will appropriately expand them for you in the path and pass it to abspath() to get the normalized path to the file - and if you use an absolute path, normpath() will simply collapse it relative to the current working directory if inside it, or use the absolute path. So, you're covered in all cases. 😄

*insert witty xkcd comic*

This is truly a philosophical question, what's the best way of going up a directory, I have a preference but it depends on what you need and what you want. The cleanest example, well -

import os
os.chdir("..")

It has its own host of problems but surprisingly, platform-agnostic is not one of them. I can't attest to all systems but it works swimmingly on Windows and Unix systems. But it's too less colourful for my liking and personally, my use-case demanded a definite relative path, so I didn't quite like this (the directory is dependent on the current working directory afterall).

So, for my use-case I decided on:

import os
import json
with open(os.path.join(os.path.dirname(__file__), "..", "templates", "src")) as file:
	self.src = json.load(file)

Quite personally, I like the approach, a bit unwieldy but easy for me to stomach. A similar approach without the unwieldiness but an extra import would look like -

Path(__file__).parents[1].joinpath("templates", "src")

But even that's as wordy as my join(). If you're interested in seeing more, this page has tons of "good" approaches, but my favourite is clearly -

__file__.rsplit(os.sep, 2)[0] + '/templates/src'

Just comes at the cost of a bit of readability.

And now: Black Lives Matter[edit]

I've seen people in the tech community who believe that this is a political movement unrelated to the tech industry but I have to disagree. It's a fact that black people and especially black women are under-represented in the STEM community. In fact, most of the women representation in this community comes from the healthcare and medical industries.

Medium

I like statistics and despite statistics being very useful for misleading the public, they are a very important way of conveying numbers to the masses. In 2019, the latest diversity reports from Twitter, Google and Facebook said that less than 5 percent of the companies' tech workers identify as black. In Silicon Valley as a whole, blacks and Hispanics make up between 3 percent and 6 percent of workers, and women of colour are 1 percent or less. There's a clear imbalance that we need to address, and the easiest way to do so - and this is no small ask - is to encourage companies to diversify their workplaces and to encourage members of marginalized communities to be trailblazers in their own right.


That's all for today, see you in the next one! 😄

Contributions
goodbot
  • Implemented user-facing features, such as intelligently answering questions based on given FAQs.
  • Subscribing them depending on whether they are an Outreachy/GSoC/GSoD participant.
  • Redirection them to technical resources.
  • Modularize bot replies
  • Implement fuzzy-matching to answer FAQs