Playing with Wolfram Alpha and Python

I love the idea of Wolfram Alpha. I haven’t used it enough to tell how the reality of it compares. Mostly what I’ve seen is easter-eggs, which are fun, but I wanted to see if I could do something more substantive.

An article I’d seen in the news lately piqued my interest about the number of PhDs by US State. It took a goodly amount of fiddling before I figured out I could get the answer by submitting this query:

‘How many phds in California?’

Take a look at what this gives you: http://www.wolframalpha.com/input/?i=how+many+phds+in+california%3F

Lots of great information, including the specific answer I want, 331,101.

If I want to do this for all fifty states, I’m going to need to use the API not the interface. I registered for an API account which was quick and easy and checked out the bindings available on their site. The bindings left a lot to be desired, so I decided to just use urllib2.

Lately I can’t abide dealing directly with XML, so I found this nice library xmltodict. I’m sure [insert xml parsing technique] would work just as well or better.

As I was working I realized I also wanted to pull in total population, total adult population, etc. so I could talk about percentages.

I did this all in an IPython Notebook (if you’re not using this, you need to start, it’s totally awesome. Check out http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html).

Here is my notebook (created using Python 2.7): https://gist.github.com/4052828

Here is a rendered version of the notebook: http://nbviewer.ipython.org/4052828/

All in all it was a fun exercise but it still felt like page scraping. The one advantage is that you can ask the same question with slight modifications easily (e.g., how many people in California vs how many adults in Idaho) and get back essentially the same response so that’s handy. And although Wolfram Alpha returns a machine readable data structure (XML), it’s not exactly richly semantically tagged. There are plain text bits that have to be parsed. For example, sometimes a population will be expressed as its raw number, sometimes as 1.2 million. So I had to put special handling in my code for that case. It would be nice if such quantities were available with no plain-text parsing required.

One other adjustment that would be good would be to thread-out the calls to the API. It takes a good amount of time to process all of these serially and there’s no reason they couldn’t be threaded. I’ll definitely take a look at doing this.

If you’re just interested in results, here you go. This is the percentage of adults with PhDs and percentage of adults with at least an associates degree ranked by state from highest to lowest.

EDIT: After writing this, I found http://pypi.python.org/pypi/wolframalpha/1.0. This looks to be a nicer wrapper, I’ll have to take a look and see if it works as advertised. Let me know if you’ve used it successfully.

----------------------------------------------------------------------------------------------------
phds
#1: Washington DC: 3.68%
#2: Maryland: 2.32%
#3: Massachusetts: 2.30%
#4: New Mexico: 1.75%
#5: Vermont: 1.66%
#6: Connecticut: 1.60%
#7: Delaware: 1.56%
#8: Virginia: 1.54%
#9: California: 1.42%
#10: New Jersey: 1.41%
#11: Rhode Island: 1.39%
#12: Colorado: 1.37%
#13: Oregon: 1.35%
#14: Washington State: 1.34%
#15: Pennsylvania: 1.31%
#16: New York: 1.30%
#17: Hawaii: 1.25%
#18: New Hampshire: 1.24%
#19: Arizona: 1.18%
#20: Utah: 1.17%
#21: Minnesota: 1.16%
#22: Montana: 1.14%
#23: North Carolina: 1.13%
#24: Illinois: 1.13%
#25: Nebraska: 1.12%
#26: Maine: 1.12%
#27: Florida: 1.10%
#28: Alaska: 1.10%
#29: Kansas: 1.08%
#30: Tennessee: 1.05%
#31: Missouri: 1.05%
#32: Georgia: 1.04%
#33: Idaho: 1.02%
#34: Wyoming: 1.02%
#35: Iowa: 1.02%
#36: Wisconsin: 0.99%
#37: Michigan: 0.98%
#38: South Dakota: 0.97%
#39: Ohio: 0.97%
#40: Texas: 0.94%
#41: South Carolina: 0.93%
#42: Alabama: 0.92%
#43: Indiana: 0.91%
#44: North Dakota: 0.87%
#45: Mississippi: 0.82%
#46: Oklahoma: 0.81%
#47: Louisiana: 0.81%
#48: Kentucky: 0.80%
#49: Nevada: 0.78%
#50: Arkansas: 0.77%
#51: West Virginia: 0.77%
----------------------------------------------------------------------------------------------------
college graduates
#1: Washington DC: 50.47%
#2: Massachusetts: 48.28%
#3: Connecticut: 45.71%
#4: New Hampshire: 44.77%
#5: Colorado: 44.07%
#6: Vermont: 43.98%
#7: New Jersey: 43.76%
#8: Maryland: 43.53%
#9: Minnesota: 42.94%
#10: Hawaii: 41.90%
#11: Washington State: 41.71%
#12: Virginia: 41.58%
#13: New York: 40.38%
#14: Rhode Island: 39.72%
#15: North Dakota: 39.42%
#16: Maine: 39.06%
#17: Illinois: 39.05%
#18: Oregon: 39.04%
#19: Florida: 38.78%
#20: Nebraska: 38.50%
#21: Montana: 38.27%
#22: Kansas: 38.26%
#23: California: 38.13%
#24: Delaware: 37.30%
#25: South Dakota: 37.15%
#26: Iowa: 36.75%
#27: Wisconsin: 36.73%
#28: Pennsylvania: 36.66%
#29: Utah: 36.56%
#30: Arizona: 36.26%
#31: North Carolina: 35.96%
#32: Michigan: 34.98%
#33: Wyoming: 34.34%
#34: New Mexico: 34.09%
#35: Georgia: 33.89%
#36: South Carolina: 33.78%
#37: Idaho: 33.74%
#38: Ohio: 33.65%
#39: Missouri: 33.54%
#40: Alaska: 33.04%
#41: Texas: 31.96%
#42: Indiana: 31.04%
#43: Oklahoma: 30.72%
#44: Tennessee: 30.29%
#45: Alabama: 30.18%
#46: Nevada: 30.18%
#47: Kentucky: 28.44%
#48: Mississippi: 27.99%
#49: Arkansas: 26.74%
#50: Louisiana: 26.32%
#51: West Virginia: 25.49%
Advertisements

1 Response to “Playing with Wolfram Alpha and Python”


  1. 1 Richard Johnstone November 11, 2012 at 7:56 pm

    Jeff, thanks. Checking for an API to Wolfram Alpha was on my task list for this next week. Thanks for a very nice introduction to it. — Richard


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Twitter-feed

  • Ah that special feeling when a single plot is so elegant and concise that you can delete three other plots and about 1,000 words. 3 weeks ago
  • I can tell that my analysis just completed because my CPU fan finally ceased its high-pitched whine. That is surprisingly satisfying. 4 weeks ago
  • @mpbtownsend cool cool. I'll send something soon 1 month ago
  • @mpbtownsend either way I'm happy to share my AERA paper with you once I write it hahaha. I'd love to get your feedback! 1 month ago
  • @mpbtownsend will you be at AERA this year? I'll be presenting much of the same material 1 month ago

%d bloggers like this: