The Messages Corpus Scripts

In 2014, I created a Bash script and an R script with the goal of making it easier to search through message history stored using Apple’s for Mac. The logs are written in the format of XML plist files, but saved in a binary format that can’t be read by a normal text-editor. The Bash script finds all message log files matching whatever criteria you give it, converts them to raw XML, removes duplicates, and stores the resulting files in a new folder.

Once the files are converted to the proper format, the R script reads them and parses all the XML to create a table of conversations. The script supports reading conversations with different people all in one go; in other words, you can import your entire Messages history all at once, and the script will create a separate conversation history for each person. Once you’ve run the R script, I would recommend saving your R workspace so you can reopen it later instead of having to run the R script again.

In my opinion the most useful feature of the R script is the searchCorpus() function. You can search for any regex and it will find all instances of it, organized by which person it was with; or you can limit your search to just conversations with a specific person using the index parameter, e.g. searchCorpus("meeting",index="Jim Smith") will return instances of "meeting" in conversations with Jim Smith.

searchCorpus() shows the line numbers of every result, which allows you to then use the catTables() function to print a snippet of a specific conversation. For example if you have a relevant search result in line 200, you can run catTables(start=195,stop=220) to see it in context. There is also some data-analytic stuff I’ve been playing with; you can find those functions in the R script as well.

Thanks to Mark Myslin for his collaboration on this project.

Github | Direct Download

Contact me at: fredhope2000 (at)

My Homepage