In Praise of sed and Copy-Paste
Nothing useful or interesting to report today.
A project that I had been working on a while ago has finally percolated back up to being an emergency, so I need to finish it. I had a really nice proof of concept that worked on a particular subset of the data. It also output the results in a format that could only be use to validate the calculation against what had been done in the past. So now that I remember how the code works, I thought that I had to find a way to get it to work on the other major class of data, and I thought that I had to rewrite the fussy formatting stuff to export the results according to the spec.
And then I remembered that this code only needs to run once a year. If I can get it to work on the magical date in June, then I can put it aside for months and months and months.
I also realized that the best way for me to deal with both of my problems is with doing some judicious copy-paste (or text manipulation with sed). The most significant difference between the two data sets is the names of columns that are thrown away in an early step of the analysis. Also, this particular data source only publishes as CSV (and is not easy to get directly from the database). So I’ll just rename the bad columns in the new data to match the names of the bad columns in the old data. It doesn’t really matter whether they record the same sorts of information because the bad columns are thrown out.
Also, the end-goal of this particular project is to give a colleague an Excel-type
spreadsheet. What is super-annoying is that information that I internally encode in
a single column is expected to sometimes be recorded in one column and sometimes be
encoded in two columns in the final product. Yes, this is super-weird. My column
contains a number from the set {0, 1, 2}. If the value is 0, then there should be
some specific words in column C in the Excel spreadsheet, and column D should be
empty. If the value is 1, then there should be some other words in column C in the
Excel spreadsheet, and column D should be empty. And if the value is 2, then I should
have the same words as case 1 in column C, but I should also have some additional
words in column D. Eff that, I’m going to save my results as a CSV and then do a
find and replace on ,0\n
, ,1\n
, and ,2\n
. And then I’ll open it in some sort of
spreadsheet app and export it in Excel format.
And that is today’s update from the glamorous world of tech.