xml.etree.ElementTree and OS randomness

OK, I’ll start by doing something you aren’t supposed to do: parse HTML using a bunch of regular expressions in Python. Don’t do this. Life is too short. Use something like BeautifulSoup. I couldn’t, because I had to use standard libraries only, so here I am.

Anyway, my program manipulated a tree of XHTML using xml.etree.ElementTree. Typical stuff like any tutorial on it:

import xml.etree.ElementTree as ET
...
root = ET.parse(htmlfname).getroot()
bod = root.find('body')
...

Then after a bunch of scraping, I used a regex like this to get the address in the href of the link:

anchor_addr = re.search("<a href=\"([^\"]*)\"",anchor_base).group(1)

(I’m converting HTML to Markdown. Don’t ask why.)

Anyway, this worked fine on Windows. I pushed the code, and a Unix machine in the CICD pipeline built it, and failed on this line. The match would fail, and anchor_addr wouldn’t have a .group(1), because it was a NoneType.

At first, I thought it was the typical problem with linefeeds, like my code was splitting strings into arrays with \n and on Windows it had \r\n. After messing around with that, I found it it wasn’t the case.

Here’s the problem: xml.etree.ElementTree uses a dictionary to store the attributes of an element it parses. Python dictionaries are inherently unordered. Or they can be unordered; it’s an implementation detail. And it looks like the version of Python I was using on Windows was ordering them, but the ones I was using on my home Mac and on this unix build machine were returning the attributes alphabetically. So <a href="foo" alt="hi"> was becoming <a alt="hi" href="foo"> and breaking my regexp.

My code didn’t really need to find the entire element and pull the value of the attribute, because it was already inside the element. So I was able to change that regexp to "href=\"([^\"]*)\"" and that worked, provided it was never HREF or HREF='foo'.

Long story short, don’t use regular expressions to parse HTML.

A few quick updates

Five years, five updates. Sounds about right.

A few quick updates, in the interest in posting more than once a year:

  • I’ve been managing other writers for almost four years now, so I should probably change my byline to reflect that. I’m still writing and working on tool/architecture stuff. But at some point, I should write down some snappy articles with management tips and tricks. Maybe some stuff on synergy and paradigm shifts.
  • My byline also mentioned being a FrameMaker nerd. I’ve barely used FrameMaker in the last five years, except to get stuff out of it. Aside from a few things still lingering in *.fm files, most of my company uses DITA, and some of use Markdown and Jekyll. But the eventual plan is to move everything to MadCap Flare. So maybe I’ll write about that more later. For now, strike FrameMaker from the byline.
  • I’ve been doing some programming lately, mostly in the quest to get a bunch of DITA converted over to Flare. Whenever I’ve run into a conversion or production program that required custom code, I’ve always said “I’m not a programmer” and then spent months trying to get developer time to fix it. I think it’s time I change that to “I’m not a very good programmer” and work on improving that. As of late, that’s involved a lot of Python, so you may see some posts on that.

Of course, I’m saying all of this, and I may not post again until 2024 because I’m so busy. But, stay tuned.

Git choose-your-own-adventure

I use git on a daily basis. Not to age myself, but I went from rcs to cvs to svn to git, with some Perforce and SourceSafe sprinkled in there, so I have a long background in source control. But git is just different enough to throw me sometimes. And for whatever reason, a lot of git documentary gets a bit, well, academic. It’s hard to find a quick answer sometimes, and they’re often buried in very theoretical arguments about whether or not git submodules are essential or the worst thing since Agent Orange.

Anyway, here’s my go-to for answers when I break something: http://sethrobertson.github.io/GitFixUm/fixup.html

There are two things I like about this page. First, it gives you answers without a lot of guff. Less is more, etc.

The second thing is that it’s presented as a choose-your-own-adventure. Are you trying to do this? Did you do this? Do you want to do this? It’s a simple troubleshooting flowchart, something I’ve written in docs before, and something you’ll run into every time you call tech support. (There it gets a bad rap because the first steps are always “did you try restarting?” and that seems insulting. It’s also the problem 90% of the time. That’s another topic, though.)

But as a kid who grew up devouring Bantam books from the book fair at my grade school, and memorizing The Cave of Time way back then, there’s something very intuitive and appealing about this form of docs. I’m not going to say this makes a git branching disaster where you think you lost everything fun, but it makes it easier to drill through the problem and find a solution. I don’t really have any docs at work where I could pull something like this, but as docs in general become more informal and user-oriented, this would be a great format to use.

Contributing to Open Source

One of the questions I’m always asked is how to get experience in tech writing. It’s a chicken/egg problem: you need a job to get experience, but you need experience to get a job. You can attend a code camp or complete a certificate program to get some real-world experience to add to your resume, but what I always tell people is to get started with open-source software.

There is a lot of open-source software out there these days. I got started in computing before this was the case, but was an early adopter of the Linux OS in the early 90s, which changed everything. Now, entire communities have sprung up around the development of operating systems, servers, programming languages, and desktop software. Tools like GitHub have centralized participation, putting source code in a centralized location, and providing tools for quick communication about projects, instead of crufty old mailing lists. It’s made it very easy to explore projects and contribute to them.

Here’s a great article about how to get started on this:

https://medium.freecodecamp.org/contributing-to-open-source-is-not-hard-here-is-my-journey-to-contributing-to-nodejs-d10760e31194

This example is specifically about a developer contributing to the Node.js programming language. But there are a lot of opportunities for tech writers, because documentation for projects is not always that great. The only issue is that there aren’t always that many documentation bugs listed in projects. It requires hunting down a project you like (check out the “explore” section of GitHub to browse through things) and deciding what needs improvement.

Overall, it’s a great way to gain experience, and also improve tools that everyone will use. And, maybe you’ll someday get a job out of it.

Gray/Grey

I wish I had learned this forever ago:

  • If you are in America, it is gray
  • If you are in England, it is grey

It’s that simple.

(I partially blame this confusion on Fifty Shades of Grey.)

FrameMaker and the Kinesis Advantage Keyboard

Because of RSI, I use a Kinesis Advantage keyboard. I used the Microsoft ergonomic keyboards for a while, but I’d burn one out every year, and the membrane keys are a bit mushy. I went whole-hog and spent the money on the Kinesis, which I don’t regret at all, except for one thing: the function keys. The F-keys and Esc are little rubber chicklet keys reminiscent – I’m dating myself here – of the Atari 400. And when you’re in FrameMaker and doing some heavy formatting, those keys are essential.

You can remap keys in the Kinesis at the keyboard level, but I switch between Windows and Mac machines through a KVM switch, so I wanted to avoid that. Instead, I use AutoHotkey, a nifty little free Windows utility that enables you to easily remap keys at the OS level.

Head over to https://autohotkey.com/ and grab a copy. My AutoHotkey.ahk has the following entries for FrameMaker:

Home & 5::Send, {F8}
Home & 6::Send, {F9}
End::Send, {Esc}

I don’t use the Home and End keys much, so I stole them for my own use. On the Kinesis, they are in my left thumb cluster, and great for frequently-used combo keystrokes. So I map:

  • Home-5 to F8. That opens the character designer.
  • Home-6 to F9. That opens the paragraph designer.
  • End to Esc. I use Esc constantly for Frame shortcuts, most notable Esc-j-j to repeat a paragraph assign, and Esc-c-c to repeat a character assign.

AHK can do a lot more stuff, like assigning keys to modify and paste the clipboard, run macros from DLLs, and more. If you have a highly repetitive text massage job you can’t tackle in a script, check out the advanced options. But it’s easy to use for a few quick remaps.

 

Adjusting Mic volume from the command line in Windows

I spend a lot of time in meetings on a USB headset. I have a Windows computer. So every day, multiple times a day, here’s the drill:

  1. Plug in headset.
  2. Mic volume is a random number.
  3. Right-click Sound.
  4. Wait for the right click to register, because half the time it doesn’t, and the lag is horrible.
  5. Select Recording Devices. See last step about lag.
  6. Click Microphone.
  7. Click Properties.
  8. One in ten times, this will cause the control panel to gray out and freeze, so wait five minutes and/or force quit it and start over.
  9. Click the Level Tab. (It’s not the first tab, so this takes ten seconds to scan.)
  10. Change the level back to 100.
  11. Click OK.
  12. Click OK.
  13. Feel the great joy in knowing you’ll have to do this three or four more times that day, which adds up to like 18 months of your life.

The same process on a Mac:

  1. Plug in the headset. It remembers what you had the level set to the last time it was plugged in.

There is a great way around this. It’s called NirCmd, and it’s a great little free Windows command-line executable that enables you to do all sorts of weird things from a command prompt or batch script.

Go grab a copy of it at the above link. The first time you click on the exe, it will ask to copy itself to your Windows directory, and then it’s in your path, ready to go. Create a .bat file like this:

nircmd setsysvolume 65535 Microphone

Each time you put in your headset, run that, and you’re good to go.

Nircmd also has some commands for shutdown/reboot/standby/logoff if you are sick of digging in the Windows menu for those commands.

REST API Doc article

I think the big problem with API docs is what to document. And it usually ends up with a comment per method, where you have “createFoo = creates a foo; deleteFoo = deletes a foo” and no real content. The question always becomes “well, what else?” And the real value of API docs is everything in between, the things you can’t infer by just looking at method signatures.

The following article is a good list of the other things you need to cover. It talks specifically about REST, but holds true for other API types, also. The TL;DR: what does a consumer need to know to actually use this API? How do you do auth? What are the business uses? What standards does it really follow? What are the other gotchas? What are the SLAs or other requirements for the service?

https://dzone.com/articles/rest-api-documentation-part-1

The hoarding of 25,000 manuals

I started tech writing right around the time print was dying. My first job involved writing software documentation as a manual and as WinHelp, and it all went electronic after that. I also briefly wrote for print books about Linux and Emacs. So I have a box with versions of those manuals, and that box has bounced across the country with me, and is currently sitting in a storage space I never visit. But I always think about that chunk of print in storage, and what the end game is there. As I get new pieces of electronic junk in my life, I try to save the PDF versions of the manuals and get rid of the paper, but there’s a certain something about the print versions, too.

Anyway – Jason Scott of the Textfiles web site recently ran into a seller of manuals who was going out of business and was going to junk an insanely large archive of print books. It’s an impressive collection of old print books going back to the 30s, apparently old radio equipment or parts manuals and whatnot. The owner gave him carte blanche to grab whatever he wanted, and he’s currently stuffing everything he can into a storage space for later dissemination.

Check out the photos of the effort – this is some real print manual pornography, if that’s your sort of thing: http://ascii.textfiles.com/archives/4683

There’s also a paypal to help him cover the costs of storage and shipping and whatnot, if you want to chip in a few bucks for this massive effort to save some classic dead trees.

Removing the Adobe Update Installer from the Mac menu bar

So at some point, Adobe Creative Cloud added a second notification to the menu bar on my Mac. There’s already one that opens the CC app, and always sits in the menu bar. And everyone else wants something in the menu bar. But Adobe started showing this second icon, with a perpetual 1 next to it, even if there were no updates, and all it does is open the first one. No combination of preference-toggling would make it go away, and neither would quitting CC.

Here’s what I had to do:

  1. Quit Creative Cloud.
  2. Rename Applications/Utilities/Adobe Creative Cloud/ACC/Creative Cloud to Creative CloudX.
  3. Go to the new weird menu bar thing and select Open Updater.
  4. You’ll see the old Adobe Updater. If you see the new CC app, you did 1-3 wrong.
  5. Go to Preferences and and turn off Notifications.
  6. Un-rename the file you changed in step 2.
  7. Start the CC app.
  8. You probably have to repeat this every time Adobe issues an update, which will be in like five minutes.