Onward(s) and Upward(s)

Just a small programming note: after ten and a half years, I left my last position, and moved on to a new one. I’m still managing, still writing, but at a much younger company, which is always interesting.

As per my usual policy, I won’t mention names here, but if you follow me on LinkedIn, you can figure it out.

Also worth mentioning that as always, all words here are mine, and I don’t speak for my employer.

Unrelated grammar nit: I was trying to determine if the saying was onward and upward or onwards and upwards. The  first one sounded right to me. Both of the -ward words in question are adverbs with the -s, or adjectives without. (You look downwards; Nine Inch Nails has a great record called The Downward Spiral.)

The saying could be using adverbs or adjectives though, so none of this helps. I think both AP and Chicago Manual of Style would say that Americans would drop the -s and British love to keep it. So, onward and upward.

A quick cmd trick

I wish I could tell the 1990 me that the 2020 me would be posting DOS commands in a blog instead of using a brain-implant computer on the surface of Mars. (I also wish 2020 me could tell 1990 me to patent the idea for an online book store, or that the Reds would sweep the A’s in the World Series.) Anyway, here’s a quick one I should have known already.

You can set the prompt in a command window by setting the environment variable PROMPT. There is a PROMPT command too, which I vaguely remember from the DOS days and setting up an AUTOEXEC.BAT file. But the PROMPT command only works for that session. The variable works for every session.

The default would be PROMPT $P$G: show your current directory, then a greater-than sign, so you get C:\this\that> as your prompt. But you can do some other fun stuff:

  • $D – the date
  • $T – the time
  • $V – the OS version number
  • $B – a pipe symbol
  • $G and $T – greater-than and less-than
  • $S – space
  • $+ – a plus-sign for each level of depth in the PUSHD directory (If you’re not a habitual user of PUSHD/POPD, look into it.)

There are more; do a PROMPT /? to see them.

I normally set an environment windows by right-clicking My Computer and going to Properties > Advanced > Environment variables, but another fun trick is you can use the SETX command to do this, too. So I did this:


That adds the time to the front of the default prompt. The reason I did this is because Jekyll builds take forever, and I wanted an easy way to tell if I started the build two minutes or seventeen minutes ago, because time loses all meaning on the last day of a release.

(Thanks: http://www.hanselman.com/blog/a-better-prompt-for-cmdexe-or-cool-prompt-environment-variables-and-a-nice-transparent-multiprompt)


The one-line web server

Every now and again, I run into a situation where a bunch of HTML files can’t be opened by simply double-clicking the index.html, and need to be hosted on a web server to behave properly. I think this used to happen with WebWorks output, and built Jekyll output does this. It’s also sometimes handy to have a web server spinning in the background so you can modify and preview things on the fly.

In the past when this came up, I’d fall down a wormhole about installing the Apache or IIS web server, or trying to figure out where Apple moved the built-in out-of-date Apache web server in the latest version of macOS. (I think it’s gone completely in Catalina.) But instead of sifting through the many different Apache installers and Medium articles about how to get an entire full stack going, I realized this is dead simple if you have Python installed.

First, do you have Python installed? A quick python -V will tell you. If you don’t, grab a copy.

Open a command line in the directory you want to serve up, and if you have Python 3, do this:

python3 -m http.server

If you’re a Windows user, that is probably python and not python3.

If you’re still using Python 2, do this instead:

python -m SimpleHTTPServer

Now point a web browser at localhost:8000 and there’s your stuff.

If you’ve already got something on port 8000, put another port number after the command, like python3 -m http.server 90125.

Don’t expect to run your Fortune 500’s production environment on this. It’s great for testing, though.

(Thanks: https://developer.mozilla.org/en-US/docs/Learn/Common_questions/set_up_a_local_testing_server and https://docs.python.org/3/library/http.server.html)

Various Cabin Fever Projects

Due to cabin fever (or whatever you call it), I’ve been working on a lot of oddball programming projects. I mean, I’m working way too much at the day job, because I can’t really leave the house, so I’m always on. But I’ve also been doing various things (and then not finishing them) mostly because I don’t have a basement where I can start building a boat out of matchsticks or something.

Anyway, here’s a partial list of what’s on the various back burners of my stove. If you’re really insanely curious about any of these, let me know and maybe I’ll actually finish them.

  • I’ve been messing around more on GitHub. I’ve been doing this at my job too, but I’m trying to make an effort to be more active on my personal profile. If you have the time and you use any open source software, I’d encourage you to do the same. Even if you’re not a programmer, it’s good to know how to complain about their documentation and maybe fix a few things if you find them. My personal profile is here: https://github.com/jkonrath
  • GitHub has caused me to dig up all of the various college coding projects and see if they’re worthwhile to post on there, just for kicks. They are all fairly horrible, so no. I’m almost tempted to post the first coding project I actually got paid to do, which I still have. It was for the USGS and it did something with well depth analysis, which I totally don’t understand. The code itself is an exercise on how to not to do what we’d call Big Data analysis now, and would probably be about ten lines of Python. Fun nostalgia though.
  • Just for kicks, I started writing a Markdown to HTML program, but decided to do it all in straight-up C, using C89 ANSI C and the standard library, nothing more. Outside of Arduino C, I haven’t done much C program in… a long time. I realized why when I started working with strings. I have a newfound appreciation for Python’s built-in string support.
  • I also realized, digging through the archives, that I know more XSLT than I realized. I posted a gist about this on GitHub, but I probably should make a whole project that’s a collection of all the dumb little building blocks I used constantly when I did this on a daily basis. (Like I always have to look up how to split one file into many files. I need to write that down somewhere.)
  • I found this giant XSLT I used at a previous job to chop up Doxygen’s XML output and convert it into something I could pull into a structured FrameMaker doc. Looking at it, I’m not sure how useful it is, because I was doing a bunch of arbitrary reordering of the doc because… well, let’s say just because.
  • I also started working on an XSLT to convert WordPress output to a Flare document. I really do love that Flare documents are just XML, and you can do anything with them.
  • I converted this blog to Hugo, but in the process I realized it’s faster for me on a site this small to keep it in WordPress, so I didn’t use the conversion. It was a fun learning process, though.
  • I’ve been messing with ArchiveBox, which is a neat idea. It’s a system where you feed in URLs and it archives them in various formats. It’s sort of a DIY Internet Archive. I’ve been learning more Docker by running it with Docker-compose, and complaining on their GitHub page about various minor problems. I should spend some time helping them rewrite their docs.
  • Another site where I should pay more attention is Stack Overflow. I’m on there at https://stackoverflow.com/users/99038/jon-konrath. Funny to look back at all the FrameMaker and Doxygen posts I was answering way back when.

Speaking of obsessively building stuff in your basement, this is one of my favorite videos ever: https://vimeo.com/166403522

PDF Printing with Headless Chrome

I’ve got docs for a bunch of products in Jekyll, written mostly in Markdown, but also some HTML output from DITA and Flare. It works great, but doc reviews are a nightmare. We typically use Acrobat shared reviews for doc review, which mostly works, but the question became how to make PDFs of this stuff, especially the Markdown.

There are a lot of obvious and not-so-obvious half-answers here. Like, why not save as PDF from your Markdown editor? (You don’t get any of the templating from Jekyll, and none of the variables show up. You also have to do it from each file, by hand.) I spent years chasing a good solution for this, and tried Pandoc, Prince, PrawnPDF, wkhtmltopdf, dita-ot, and gotenberg, hitting some roadblock with each one.

My short-term solution was starting Jekyll locally, going to each page in a browser, doing a Ctrl-P, and printing to a PDF. For every page. Every single page. For every review. That was obviously not a great way to handle it.

This is when I found something genius: Headless Chrome.

Basically, headless Chrome runs the Chrome browser in the background, and then you can throw commands at it and get the output. If you ever used the venerable DZBatcher back in the day for non-interactive FrameMaker, it’s a similar concept. If you’re a Node programmer, there’s a bunch of crazy stuff you can do with it. But there’s also a command-line interface where you can do some basic stuff from a shell script.

If you have a version of Chrome newer than mid-2017, you’re set. You can do something like this:

chrome --headless --disable-gpu --print-to-pdf https://www.jonkonrath.com/

Another fun command is --screenshot, which will dump the page to a PNG.

There are a bunch of caveats, and there is almost no logging or error messages thrown by Chrome, so it’s maddening trying to debug this. So keep the following in mind:

  • Headless Chrome uses the location of your Chrome binary as the working directory, and tries to save files there. If you’re using Windows, chances are you can’t write files in C:\Program Files (x86), but it won’t tell you that if it fails. And if it does work, it will pollute that directory with output as you wonder where it’s putting the PDFs. You can specify an output file with --print-to-pdf="/full/path/file.pdf" and give a full absolute directory there.
  • There is an --enable-logging switch, which is vaguely helpful.
  • The URL you specify has to have a protocol in front of it (http:// or https://) or it will silently fail.
  • If you do something wrong, Chrome will go sideways, but keep running in the background without complaint. You’ll need to stop every Chrome process to get it to work again.

I can’t publish the Windows CMD script I wrote for two reasons: one, it belongs to my employer because they pay me, and two, it’s the worst piece of garbage you’ve ever seen. The five lines of code I had to write to generate a timestamped directory look like if one of my cats slept on my keyboard for 45 minutes. Anyway, what I basically did:

  1. Pass in a .txt file with a list of URLs in the order I eventually want the output.
  2. Create a new timestamped directory.
  3. Loop through the text file, and run the print-to-pdf command once per line, dumping them into the timestamped directory.

When I named the output files, I stripped off the protocol and hostname from the URL, and numbered the files, and replaced the slashes with dashes in the file. So if I printed http://example.com/this/is/the/index.html, it would create 1-this-is-the-index.html. The path thing is so duplicate files don’t overwrite each other.

Then when I have that directory of PDFs, I Ctrl-A, right-click, and combine the files into a PDF, then run a shared review on that. The beauty is that when I do round two of the review, I simply run the script again, and the files are in the same order. (Otherwise, the files are going to be in alphabetical order, which probably isn’t what your reviewers want.)

A handy way to get that file started is with a dir /b > file-list.txt from the command line in your top-level source directory. Don’t forget to edit out the images and directory names from your list.

This isn’t a great way to do production-ready PDFs, and you’ll have no good control over headers, footers, page layout, or anything else. Also, all of the links in the PDF won’t work right; they’ll go to localhost if you’re running Jekyll locally. But it’s close enough for review purposes.

More info on Headless Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome

xml.etree.ElementTree and OS randomness

OK, I’ll start by doing something you aren’t supposed to do: parse HTML using a bunch of regular expressions in Python. Don’t do this. Life is too short. Use something like BeautifulSoup. I couldn’t, because I had to use standard libraries only, so here I am.

Anyway, my program manipulated a tree of XHTML using xml.etree.ElementTree. Typical stuff like any tutorial on it:

import xml.etree.ElementTree as ET
root = ET.parse(htmlfname).getroot()
bod = root.find('body')

Then after a bunch of scraping, I used a regex like this to get the address in the href of the link:

anchor_addr = re.search("<a href=\"([^\"]*)\"",anchor_base).group(1)

(I’m converting HTML to Markdown. Don’t ask why.)

Anyway, this worked fine on Windows. I pushed the code, and a Unix machine in the CICD pipeline built it, and failed on this line. The match would fail, and anchor_addr wouldn’t have a .group(1), because it was a NoneType.

At first, I thought it was the typical problem with linefeeds, like my code was splitting strings into arrays with \n and on Windows it had \r\n. After messing around with that, I found it it wasn’t the case.

Here’s the problem: xml.etree.ElementTree uses a dictionary to store the attributes of an element it parses. Python dictionaries are inherently unordered. Or they can be unordered; it’s an implementation detail. And it looks like the version of Python I was using on Windows was ordering them, but the ones I was using on my home Mac and on this unix build machine were returning the attributes alphabetically. So <a href="foo" alt="hi"> was becoming <a alt="hi" href="foo"> and breaking my regexp.

My code didn’t really need to find the entire element and pull the value of the attribute, because it was already inside the element. So I was able to change that regexp to "href=\"([^\"]*)\"" and that worked, provided it was never HREF or HREF='foo'.

Long story short, don’t use regular expressions to parse HTML.

A few quick updates

Five years, five updates. Sounds about right.

A few quick updates, in the interest in posting more than once a year:

  • I’ve been managing other writers for almost four years now, so I should probably change my byline to reflect that. I’m still writing and working on tool/architecture stuff. But at some point, I should write down some snappy articles with management tips and tricks. Maybe some stuff on synergy and paradigm shifts.
  • My byline also mentioned being a FrameMaker nerd. I’ve barely used FrameMaker in the last five years, except to get stuff out of it. Aside from a few things still lingering in *.fm files, most of my company uses DITA, and some of use Markdown and Jekyll. But the eventual plan is to move everything to MadCap Flare. So maybe I’ll write about that more later. For now, strike FrameMaker from the byline.
  • I’ve been doing some programming lately, mostly in the quest to get a bunch of DITA converted over to Flare. Whenever I’ve run into a conversion or production program that required custom code, I’ve always said “I’m not a programmer” and then spent months trying to get developer time to fix it. I think it’s time I change that to “I’m not a very good programmer” and work on improving that. As of late, that’s involved a lot of Python, so you may see some posts on that.

Of course, I’m saying all of this, and I may not post again until 2024 because I’m so busy. But, stay tuned.

Git choose-your-own-adventure

I use git on a daily basis. Not to age myself, but I went from rcs to cvs to svn to git, with some Perforce and SourceSafe sprinkled in there, so I have a long background in source control. But git is just different enough to throw me sometimes. And for whatever reason, a lot of git documentary gets a bit, well, academic. It’s hard to find a quick answer sometimes, and they’re often buried in very theoretical arguments about whether or not git submodules are essential or the worst thing since Agent Orange.

Anyway, here’s my go-to for answers when I break something: http://sethrobertson.github.io/GitFixUm/fixup.html

There are two things I like about this page. First, it gives you answers without a lot of guff. Less is more, etc.

The second thing is that it’s presented as a choose-your-own-adventure. Are you trying to do this? Did you do this? Do you want to do this? It’s a simple troubleshooting flowchart, something I’ve written in docs before, and something you’ll run into every time you call tech support. (There it gets a bad rap because the first steps are always “did you try restarting?” and that seems insulting. It’s also the problem 90% of the time. That’s another topic, though.)

But as a kid who grew up devouring Bantam books from the book fair at my grade school, and memorizing The Cave of Time way back then, there’s something very intuitive and appealing about this form of docs. I’m not going to say this makes a git branching disaster where you think you lost everything fun, but it makes it easier to drill through the problem and find a solution. I don’t really have any docs at work where I could pull something like this, but as docs in general become more informal and user-oriented, this would be a great format to use.

Contributing to Open Source

One of the questions I’m always asked is how to get experience in tech writing. It’s a chicken/egg problem: you need a job to get experience, but you need experience to get a job. You can attend a code camp or complete a certificate program to get some real-world experience to add to your resume, but what I always tell people is to get started with open-source software.

There is a lot of open-source software out there these days. I got started in computing before this was the case, but was an early adopter of the Linux OS in the early 90s, which changed everything. Now, entire communities have sprung up around the development of operating systems, servers, programming languages, and desktop software. Tools like GitHub have centralized participation, putting source code in a centralized location, and providing tools for quick communication about projects, instead of crufty old mailing lists. It’s made it very easy to explore projects and contribute to them.

Here’s a great article about how to get started on this:


This example is specifically about a developer contributing to the Node.js programming language. But there are a lot of opportunities for tech writers, because documentation for projects is not always that great. The only issue is that there aren’t always that many documentation bugs listed in projects. It requires hunting down a project you like (check out the “explore” section of GitHub to browse through things) and deciding what needs improvement.

Overall, it’s a great way to gain experience, and also improve tools that everyone will use. And, maybe you’ll someday get a job out of it.


I wish I had learned this forever ago:

  • If you are in America, it is gray
  • If you are in England, it is grey

It’s that simple.

(I partially blame this confusion on Fifty Shades of Grey.)