As you may have picked up if you’ve been reading this blog, I’ve been teaching myself Ruby lately; I have a Sinatra based Arduino project that I’m working on, another Rails based personal project1, the hacking on the blogging engine that powers this site, and general scripting.
I’ve also been a little bored at work lately, because the project that I’m currently working on is in stasis. Luckily, the lab where I work is helping to put on this year’s Dynamic Walking Conference right here in Sunny Pensacola. Click at your own risk, as the site is still a work in progress.
Anyway, since my normal day-to-day stuff is being punted due to matters that are out of my hands, I’m lending my experience with working on the web to the folks organizing the conference site, and today I had the opportunity to flex a little bit of Ruby muscle.
Be warned, if you’re a Ruby pro then this is going to be intensely boring and plain to you, because this is probably the most basic task on the face of the planet. But it allowed me to turn what the organizing folks thought would be an all-day ordeal in to a 20 minute task I was able to knock out in under one cup of coffee2. It’s not the most exciting stuff, but at the same time I find that talking about and explaining concepts as you’re learning them is a great way to cement them in your head. So here we are, on this blog, wherein I will bore you with lame elementary Ruby code.
In short, I was tasked with revamping the list of attendees page. Currently, it is a series of screenshots from another reporting system and a lot of the images didn’t scale well. I was to create a table containing all 153 attendees, including their uploaded photos. All attendee information was self-submitted, which made the whole thing a little tricky, but I’ll get in to that later.
The attendee information is being stored using a separate online registration service. All I had to work with was a report generator built in to this service, that gave me a variety of formats that were all but useless except for one: comma-delimited text files (.CSV files).
I’m a programmer! Delimited plaintext is something I can work with!
So, the hiccups. The first problem is that all of the images were stored in a near useless manner, using strings of numbers as file names. Probably hashed or something. It would be difficult to tell which attendee each image corresponded to. Plus there was the issue of downloading 150-something images from the temporary, expiring links that the report generator gave me. Sounds like we have a task that needs automating.
- Download all of the images submitted by each attendee from the provided temporary URIs
- Associate each image with an attendee
This, friends, turned out to be cake.
I ran a report that spat out the full name of each attendee followed by the URI for their picture, and then I passed that csv file through this script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- I store an array of each line in the CSV file. I became aware of the existence of
IO.eachafter doing this, but for such a basic script it hardly seems to matter.
- I split each line in to a name and a URL. CSV’s were made for this stuff, baby.
- I make a new writable file in binary mode named after the name of the attendee. If you don’t open in binary mode, then Ruby will default to treating the incoming data as plaintext and the image will be all sorts of messed up. This is obviously not what we want. This behavior is inherited from Ruby’s C underpinnings.
- I use the OpenURI module to open the URL of the image as a file, and pipe it in to the jpg I just created. This creates a jpg with the attendee’s image, with their name as the filename.
Easy. Took just a few minutes to run, and I had a complete set of pretty faces.
Next, I needed to create a table to hold all this information. We needed to display the image, name, organization, and country of origin for each attendee. You’d think this would be easy, but there were a few things wrenched me a little bit.
I ran a CSV report containing the relevant information, and then I tried out a script that looked something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
This looked completely fine at first, until I noticed some weirdness.
- Some people submitted their info in all caps.
- Some organizations have commas in their name, e.g. “University of Colorado, Boulder”
- Some people had random quotation marks in their info
- A lot of the people visiting from South Korea entered their country of origin as “Korea, Republic of”
- The list wasn’t alphabetized by the report service when I generated the report, and I saw no immediately obvious way to do so.
Numbers 3 and 4 were easy enough to fix using my text editor on the CSV file with a quick find/replace. Natch.
Next was addressing those commas. You obviously want to keep the commas in there, but the CSV format uses commas as a delimiter. What do?
This was a fairly easy fix as well. Since any character in the ASCII character set can be represented in HTML using their ASCII code in an HTML Entity, and since the output I am using is HTML, I simply ran a find/replace for any commas with trailing whitespace3 and replaced them with their
, HTML Entity. There was also an errant ampersand that I replaced with its appropriate entity.
Now we needed to alphabetize the list by last name and fix the letter case used by some of the submitters. I didn’t want to have to do a bunch of extra splits and joins, start working with hashes, or start making objects with attributes in such a simple, simple script. So what to do?
Simply, I ran a new report. This time, the first element of every line is the last name of the attendee, followed by the full name and the rest of the information. Repeated data, sure, but we’re just trying to hack together some quick system automation, not ship code that’ll run on the ISS.
This allows me to sort the original list after the call to
IO.readlines with a simple
sort call. Afterwards, when I split each line, I just throw away the first element in the split array.
Letter case was easy enough. I just used a
gsub with a regular expression to enforce a sort of titlecase on the Country and Name fields. Here’s the final version of the script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
And there it is. A perfectly generated table with all of the requisite information for 153 different people that I was able to put together in around 30 minutes. Thanks to the wonders of scripting.
Seriously, if you consider yourself even half a geek and you’ve never even begun to look at how you can save yourself time by learning to write scripts in any language, you’re only doing yourself a disservice. The possibilities are endless for what you can do to glue small tasks together and make your life a million times easier than you ever thought possible.
Which will probably end up moving to Sinatra, since the project only needs Ruby for a simple backend API with no DB connections. I just didn’t know enough about Sinatra when I started the original project. Nuts to me for not doing my research.↩
The cup of coffee is my standard unit of time measurement at the lab. I drink a lot of coffee.↩
The CSV format doesn’t include whitespace in its delimiter so any comma with trailing whitespace was obviously not a comma created by the CSV report generator.↩