Sunday, September 10, 2017

Same Topic, Various Guises

I'm reading "Writing Your Dissertation in 15 Minutes A Day" by Joan Bolker, and I liked one of the parts enough to blog about it:

"Some people seem always to have known what they want to write their dissertations about. They are the lucky ones...Some, like me, have written their way through the same topic in various guises often enough so they know it's theirs for life."

The second sentence there stuck out to me, because whether it's robots or people, pulling information from messy data sets appears to be kind of my schtick when it comes to research.  For one of my final projects in college (I had 2 - one for Applied Math, one for CompSci), I obtained several decades' worth of U.S. government census data for each of the Saint Louis Metropolitan Area's counties and created a model for the population in-flows and out-flows from each county (spoiler: STL City and County had serious out-flows to St Charles and Jefferson County).  I was really proud of the big Excel spreadsheets I made - for the time span I was looking at, Census data was spread out among a couple websites.  Each source had its own file format, so it took a lot of data cleaning to get a pretty dataset with all the information I needed.  It felt easier to write the programs that manipulated the datasets and distilled their information into a single end result, but I remember that getting the model right was a challenge, too.

There are parallels to this in my current thesis project for finishing my M.S. - again, it took forever to put together the infrastructure that collects robot data.  The thing that makes this more complicated is instead of working with a finite dataset, I'm constantly finding that I need A) more examples of the driving route, and B) different routes to compare against. 

Unfortunately, I haven't found a good way to automate the ROS/Python/bash scripts so that all the ROS code I need runs in parallel, and that means there's a lot of set-up each time I want to collect more driving data.  The good part is that I have some solid bash/Python scripts that automate the data cleaning and formatting part after the data has been collected.

Conclusion: It might be worth spending a little more time on getting the data collection infrastructure automated, if that would speed up the rest of the process.

No comments:

Post a Comment