Stay at the forefront of data & information management. Join to comment, vote, contribute and learn. Learn About Member Benefits & Join Now


Open Data in NYC

Last week, I participted in a BetaNYC meetup in Brooklyn, NY.  We were about 12 developers there to hear mini-presentation on current projects.  Chris Wong, the BetaNYC organizer, demonstrated a real estate property tax mapping application he built in Ruby on Rails.  His app scrapes data from thousands of PDF’s the city publishes on the individual property tax assessments, exemptions, and abatements, transforms them into a CSV file which gets imported into excel, checked for quality, and exported into MongoDB.











On the above map, the Purple areas are tax exempt – Museaums, Hospitals, and Religious institutions.  The property highlighted, 5 West 86th Street, is the building where I was born and it pays nearly $800,000 in tax.  Just by looking at the map, for instance, we learned that many hospitals and religious institutions also own apartment buildings that have zero tax bills.  We all wondered how many in the city were aware of this.  Last week this was a prototype.  This week, its a public Beat.  That’s the speed of Open Data.

Chris is a recent graduate of NYU School of Urban Planning and taught himself to code in 18 months.  He said the code libraries available for Ruby let him build his app in less than 2 months.  His map only covers the upper west side of Manhattan, but it was already fascinating to compare property tax values in those neighborhoods.  We in IBM often think that Cities have to curate their data before publishing, but what Chris demonstrated is that small teams of developers can work with what the City publishes (PDFs, incomplete data sets, errors, etc) and clean it up as part of their app development.  They aren’t using big application development environments and data cleansing tools.

I asked Chris why he built this application and who he thought would use it.  He said he didn’t to educate himself on Ruby and to discover who was and was not paying property taxes in his neighborhood.  He hopes that residents of New York will use the app to discover property tax variations and press for more equitable and consistent assessments.

After Chris, I gave a short presentation on the W3C Data on the Web Best Practices WG I am co-chairing.  Most in the room had heard of W3C, but a few hadn’t.  None had considered the role of open standards in Open Data so we had a conversation about best practices in data citations, data quality, and comparability.  One participant told me that the NYPD, NYFD, and EMS all had their own database of street names and often sent their crews to the wrong addresses because in New York, there can be up to four names for the same street and each database has a different name, ie:

Sixth Avenue is also:
Avenue of the Americas
Lenox Avenue (North of 106th)
Malcom X Bldv (new name of Lenox)

Very few politicians are even aware that NYPD, NYFD, and EMS have their own databases, much less that their street names differ.  No cities are yet publishing their data with metadata that allows data to be compared within a city and across cities.  Most cities don’t even know what data they really have, where it is, or how old it is.  And few have any meaningful or standardized processes for determining how it should be published.  Its whatever seems to work at the time that gets it out there.

BetaNYC is focusing their efforts to develop applications that support the City Council because politicians need more awareness of Open Data and BetaNYC wants to demonstrate the value by providing them with apps that make their jobs easier.

The last speaker was a graduate student at Columbia who is working on an Open Data project in Cambodia – mostly around AID Data.  Many NGO’s operating in Cambodia don’t have electronic records of aid projects and the student was there to ask for advice on how to get this data.  I was surprised that anyone was even thinking about Open Data in Cambodia.













This meetup was a great experience and I encourage everyone to join an Open Data meetup group in your local city.  These are engaged and passionate citizens working together and sharing skills and knowledge to build creative applications that solve urban problems.  We can only gain by participating and contributing.


  1. Steve,

    These are great examples of what people are doing with Open Data and also the ease of integrating and presenting various types of data from numerous and disparate sources.

    Thanks for providing this information!

  2. I have a question surrounding the line between Governance on the Protection of Private Information (POPI) and the provision of an entity of their data, making it open data. The de-identification of Private Individuals information is only one aspect of the data. If a physical address (without the individual’s details is made available in an open data set by an organisation, be it public or private, it is very simple to triangulate this information from another source and gain the information set required. This is of course the one side of the coin.

    The counterbalance to this is that, in South Africa, at any rate, the POPI act (signed into law, but the date for effectiveness has yet to be announced), provides that every human has to have the ability to update their information at any place of record. This creates the requirement for an individual to know which public or private institutions actually have their details on record.

    Open data would provide this opportunity, however, how then is this information still kept confidential. The web of complications gets thicker, however, let’s start with this one conundrum. How are you addressing this Governance requirement in your Company, Country?

RSS feed for comments on this post. TrackBack URL

Leave a comment

ChatClick here to chat!+