Mild-Mannered Canadian Fury

Doug Stephen is Politely Peeved

AppleScript and Friends: Octopress, TextExpander, and Ruby


Thu, 06 Sep 2012 «permalink»

I hate UTM parameters. I know why people use them. But I still hate them. When I link to a post that I find in my RSS reader, I always want to strip these parameters out, because, well, they aren’t technically valid. Also, I hate them.

A few days ago, I automated the process of pulling out the title and link from Safari tabs that I wanted to write about. I think I should back up a little bit first though.

When I first started running this blog on Octopress, I knew I wanted to write link posts. At the time, it wasn’t a feature built-in to Octopress, so I followed Jonathan Poritsky’s tutorial on how to create a linked list for Octopress. After setting this up, I decided I also wanted a way to generate link posts from the command line, so I hacked my Rakefile and tweaked the new_post task to allow for this. It looks something like this:

new_post Rake task
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
desc "Begin a new post in #{source_dir}/#{posts_dir}"
task :new_post, :title, :external_url do |t, args|
  raise "### You haven't set anything up yet. First run `rake install` to set up an Octopress theme." unless File.directory?(source_dir)
  mkdir_p "#{source_dir}/#{posts_dir}"
  args.with_defaults(:title => 'new-post', :external_url =>'')
  title = args.title
  external_url = args.external_url
  filename = "#{source_dir}/#{posts_dir}/#{Time.now.strftime('%Y-%m-%d')}-#{title.to_url}.#{new_post_ext}"
  if File.exist?(filename)
    abort("rake aborted!") if ask("#{filename} already exists. Do you want to overwrite?", ['y', 'n']) == 'n'
  end
  puts "Creating new post: #{filename}"
  open(filename, 'w') do |post|
    post.puts "---"
    post.puts "layout: post"
    post.puts "title: \"#{title.gsub(/&/,'&')}\""
    post.puts "date: #{Time.now.strftime('%Y-%m-%d %H:%M')}"
    post.puts "comments: false"
    post.puts "published: true"
    post.puts "categories: "
    if !external_url.empty?
      post.puts "external-url: " + external_url
    end
    post.puts "description: "
    post.puts "---"
  end
end

The only major change that I made was the introduction of an external_url argument. If this argument doesn’t exist, it defaults to an empty string. Then when I generate the front-matter, I check to see if this string is non-empty. If so, I create the external_url: front matter property. Pretty simple.

After using this for a few months and just doing everything manually via copy/pasting, I noticed that I had adopted a strategy that allowed me to automate much of the creation of my link posts: when I create link posts, the title of the post is almost always the original title of the article. This means, if I could find a way to automate the capture of a site title and URL, I could automate creation of link posts.

Enter my trusty friends, TextExpander + AppleScript.

The Easy Part

It turns out that getting those two elements was easy as hell; it’s making them play nicely with Rake and formatting them all pretty-like that is a bit of a pain.

If you’re just interested in capturing the title and URL from the frontmost tab of the frontmost window, that’s plenty easy:

Capturing Title and URL
1
2
3
4
tell application "Safari"
    set pageTitle to name of document 1
    set currentURL to URL of current tab of window 1
end tell

You can then do whatever you’d like with those strings. Now, the problems:

Commas in Titles

Unfortunately, Rake is incredibly strict in its interpretation of commas; they can’t be escaped, quoted, or any such thing to make them be treated in any way other than an argument delimiter. This means that if a post title has a comma in it, anything after the argument gets treated as a 2nd argument. For a regular Octopress install, this means the title gets truncated. For my link log posts, it means the 2nd half of the title gets stuck in the URL field and the URL itself gets thrown out. Ugh.

Enter AppleScript, again. Except not really. AppleScript doesn’t have great string manipulation and replacement utilities built in. And a lot of the custom-rolled ones look really, really ugly. So the easiest way, for me anyway, is to have AppleScript call out to the shell and run an environment that can handle the ultimate in String replacement utilities: Regular Expressions. In this case, I decided to go with Ruby and it’s awesome gsub method.

This won’t be a lesson in regular expressions. They aren’t hard, but learning the ropes can be a little hairy because of how ugly the syntax is. I recommend this Jeff Atwood post and the links that he provides therein for a good regex primer. gsub can also work as a basic character replace, but I’ll show you how to use it here to do some basic string replacement. The comma case is easy enough, and probably wouldn’t require a full-on regex, but I was in a hurry when I made this snippet and I was doing a lot of copying-and-pasting from other Ruby scripts that I’ve written. Here, we’re going to replace all commas with their HTML entity, ,.

We’ll define an AppleScript handler1 called replaceCommasInStringWithHTMLEntity:

Replace all commas with their HTML entities.
1
2
3
4
5
on replaceCommasInStringWithHTMLEntity(theTitle)
  set rubyCommand to quote & "puts " & "'" & theTitle & "'" & ".gsub( /,/, ',' )" & quote
  set strippedTitle to do shell script "ruby -e " & rubyCommand
  return strippedTitle
end replaceCommasInStringWithHTMLEntity

The building up of the Ruby command is a little ugly. The quote constant is just a shortcut to the escaped value for a double quote. In Ruby, Regular Expressions are given such first class treatment that you can create them using regular expression literals, much as you can use number literals and String litersals in other programming languages. RegExp literals in Ruby are denoted by enclosing the pattern with / on either side, as you can see above in the gsub example. The ampersand is the String Concatenation operator in AppleScript. puts is the Ruby command to write to stdout; in this case, the output from puts is what gets returned by the do shell script command.

One little caveat with using this is that the Rake new_post task escapes ampersands in to their HTML entity. This is easily fixable though; it, too, does this with gsub by looking for ampersands in any context. By changing the line post.puts "title: \"#{title.gsub(/&/,'&')}\"" to post.puts "title: \"#{title.gsub(/& /,'&')}\"" (notice the space after the ampersand) this replacement will only occur when there is nothing directly following the ampersand but whitespace.

So now we can automatically generate our titles; there might be some cruft stuck on there like the name of the site or the author or something, but there’s no really intelligent way to eradicate that stuff without going in to full-on text parsing, which is a non-trivial problem domain.

About that UTM stuff

So we’re finally back to where we started. I hate UTM parameters. If I open a link from my feed reader, it will usually have a bunch of FeedBurner crap tacked on to the end. Not only would keeping this stuff on there piss me off, it would also be inaccurate; people following a link from my site are going to be telling the person’s analytics that they are coming from somewhere they aren’t.

We’re going to rip this stuff out using the same Ruby trickery:

Strip UTM parameters
1
2
3
4
5
on stripUTMFromURL(urlToStrip)
  set rubyCommand to quote & "puts " & "'" & urlToStrip & "'" & ".gsub( /\\?utm.*$/, '' )" & quote
  set strippedURL to do shell script "ruby -e " & rubyCommand
  return strippedURL
end stripUTMFromURL

The only trickery here is that the question mark needs to be escaped in the RegEx since it has a special meaning, and the escape slash itself needs to be escaped in the AppleScript context. Hence, \\?. This gsub calls just says “as soon as you encounter the string ‘?utm’ followed by any number of anything else, capture from there until the end of the line and replace it with the empty string.”

With all the tools in place, the entirety of my ;newline TextExpander Snippet, as an AppleScript snippet, looks like this:

;newline TextExpander AppleScript Snippet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
on replaceCommasInStringWithHTMLEntity(theTitle)
  set rubyCommand to quote & "puts " & "'" & theTitle & "'" & ".gsub( /,/, ',' )" & quote
  set strippedTitle to do shell script "ruby -e " & rubyCommand
  return strippedTitle
end replaceCommasInStringWithHTMLEntity

on stripUTMFromURL(urlToStrip)
  set rubyCommand to quote & "puts " & "'" & urlToStrip & "'" & ".gsub( /\\?utm.*$/, '' )" & quote
  set strippedURL to do shell script "ruby -e " & rubyCommand
  return strippedURL
end stripUTMFromURL

tell application "Safari"
    set pageTitle to name of document 1
    set currentURL to URL of current tab of window 1
end tell

set pageTitle to replaceCommasInStringWithHTMLEntity(pageTitle)
set currentURL to stripUTMFromURL(currentURL)

return "rake new_post[\"" & pageTitle & "\",\"" & currentURL & "\"]"

Typing ;newline in to a terminal will spit out the string rake new_post["Frontmost URL Title with Commas escaped","http://the.url"]

And so, I have totally automated my Octopress link post generation.

Please to enjoy.



  1. Handlers are what AppleScript calls functions.