The Cable TV, Telecommunications and Public Utilities Committee conducted a public meeting on October 1, 2013 beginning at 7:18 p.m. at the Microsoft New England Research and Development Center, Thomas Paul Room, One Memorial Drive, Cambridge, Massachusetts.
The purpose of the meeting was to develop an open data ordinance for the City of Cambridge.
Present at the meeting were Councillor Leland Cheung, Chair of the Committee, Lisa Peterson, Deputy City Manager, Mary Hart, Chief Information Officer, Information Technology Department, Rebecca Rutenberg, Aide to Councillor Cheung, and Paula M. Crane, Administrative Assistant, City Clerk's Office.
Also present were Thad Kerosky, Ari Lev, Curt Savoie, Data Scientist, City of Boston, Calvin Matcalf, Lynn Chen, Marguerite Nyhan, MIT, Dan O'Brien, Research Director, Boston Area Research Initiative, Matt Cloyd, Metropolitan Area Planning Council, Sarah Laplante, Jessica Chengschrepe, ISite Design, Dave Rafland, Michael Wissner, Wissner Research, Beau Lyle, Kristen Merrill, Emily Dirsh, Red Hat, Inc., Nick Doiron, GIS Fellow, Boston, and Harlan Weber, Organizer, Code for Boston.
Councillor Cheung convened the meeting and welcomed the attendees. He stated that the Cable TV, Telecommunications and Public Utilities Committee has held a series of meeting to draft an open data ordinance for the City of Cambridge (Attachment A). Councillor Cheung stated that the meeting would start by reviewing the changes that have been made to the draft since the last open data meeting.
Councillor Cheung noted that the first change was to clarify the role of narrative in the ordinance's definition of data, located in 2.126.020.C. The updated draft removes the explicit exclusion of narrative formats from the definition of data. We recognize that in some instances, narrative may be helpful to understanding a data set, so we do not want City staff to feel as if they are prohibited from releasing narrative information when appropriate. Calvin Metcalf asked for clarification about why this change is necessary and whether non-data was subject to the same machine readability standards as data. Rebecca Rutenberg stated that the revisions to this section occur because while we do not want to encourage the use of narrative formats when more granular methods are appropriate, we want to stress that City staff may voluntarily include this in their data when necessary. She asked what language changes would be more helpful. Calvin stated that it would be helpful to have a more formal definition of non-data and asked that narrative forms be machine readable and non propriety. Rebecca stated that a potential change would be to amend the text to read "including, when appropriate, narrative in machine readable text".
Councillor Cheung stated second change is to the definition of machine readable in 2.126.020.E. Rebecca said that this change was made because we do not want to tie the ordinance to a particular data output format that may become obsolete as technology evolves. It is important that whatever format is suggested is structured to allow automated processing. We do not want to encourage data to be submitted as scanned PDFs.
Rebecca noted that the third change was made under the definition of protected data number four. At the last meeting, many people expressed concern that defining protected data as data that is stored on a City-owned personal computing device or portion of a network that is assigned exclusively to a City employee could unintentionally become a safe harbor for data that employees do not want to release on a personal computer. To address that concern, the language was changed to specify that this should be data that is only found on a City-owned personal computing device or portion of the network that does not exist in other locations, such as works in progress. This should not exclude a data set simply because it is stored on a City computer. Michael Wissner stated that defining access to data by where it is stored is problematic. While the current definition includes data on a computer or a portion of the network, it does not make reference to data stored in the cloud or over GoogleDocs. In the current ordinance, data in the cloud would not be protected by this provision. Curt stated that this provision as written would be particularly problematic if it were ordained in Boston, as most work is done on GoogleDocs. Councillor Cheung stated that he would like to see a provision for this. Ari Lev stated that it is important to remember that data that would already be exempted by public records law on City-provided computers for employees would still not be subject even if this line did not exist. It also specifies City-owned computing and City employee. Rebecca stated that this item will be reviewed and may be stricken.
Rebecca stated that Change #4 was to Open Data Accessibility, Letter F. This change allows the Open Data Review Board to determine how granular the data should be. Granular data would ideally be as basic as possible so that people can do what they want with the data.
Rebecca stated that Change # 5 is in Public Data Access, A. This gives the Information Technology Department the responsibility of providing and managing a website for the data.
Rebecca stated that Change #6 is in Public Data Access, C, to specify that the data be provided free of charge. In his submitted comments, John Hawkinson noted that it the usage of and/or in this paragraph is confusing and not appropriate. This change will be fixed. Calvin asked whether this section should specify that there are no use restrictions on the data. Examples of use restrictions include whether or not people can use the data to make apps and then charge for them, require users of their apps to click through a Terms of Service or agreement sheet before using the app, or whether all usage of the data must refer back to where the data originated from. Michael Wisner stated that the City should probably look at a GPL or MIT License. Calvin stated that the City should reference the open data site currently available from the White House, where there was discussion as to whether it was feasible to use open source licenses for their data. It doesn't seem that you can put any open source license on governmental data but that will not stop people to make a click through before accessing data. Councillor Cheung stated that we will not specifically outline use restrictions in the ordinance, but will not tie the ordinance to having no use restrictions should problems arise in the future.
Rebecca stated that Change #7 is in the Procurement section of the ordinance. In the last open data meeting, Curt Savoie provided interesting comments about the issue that arises with data ownership in third-party contracts. We have changed the ordinance to ensure that the Purchasing Agent can stipulate in contracts with external vendors, when appropriate, that the City retains ownership of all data collected.
Rebecca stated that the last set of changes was related to Open Data Review Board's composition, powers, and responsibilities. In the last meeting, people expressed that they felt at least one member of the public should sit on the Open Data Review Board. This language has been changed to allow the City Manager to appoint a member of the public with a one year term, and that there should be no individuals appointed that are acting on the behalf of a corporation. In his written comments, Saul Tannenbaum pointed out that the language as written makes it appear that there would only be two members of the Open Data Review Board. This is not the case and the language will be changed to reflect that. We also want to be sure that the City Manager will report annually on the implementation of the ordinance not just to the City Council, but to the public, as well. Calvin noted that the report should be provided in an open format. Ari asked for clarification of the language "individuals acting on behalf of a company". Rebecca noted the difference between an individual who was acting on behalf of a large software company in their role on the Open Data Review Board, which would be prohibited, and an individual who happened to work for a large software company but lived in Cambridge and was acting individually, which would be allowed. She noted the need to further clarify this and determine whether it should be included. John Hawkinson provided valuable feedback about this in his written comments.
Rebecca then noted that although all of the changes to the ordinance had been addressed, there were a few other topics in the ordinance on which feedback would be welcomed. She spoke about the definition of "data" and "data sets", which currently do not include image files such as designs, drawings, maps, photos, or scanned copies of original documents. She noted that nothing in the ordinance prohibits City staff from releasing this information and that she feels the language as written encourages City staff to open the information when appropriate, but that including it as a data type may make it okay for information to be submitted in a format that is not machine readable. She asked for comments. Curt Savoie stated that excluding maps could be a concern, referencing multilayered GIS data. Calvin noted many different types of files and data used, including geospatial data.
Rebecca stated that people have been concerned about definition of protected data, specifically that it is overly prohibiting. We want to be sure that all data that is released does not jeopardize or have the potential to jeopardize public safety and that City staff can feel comfortable when they release data by knowing that they are abiding by a set of formal guidelines to protect the public interest. This comfort level is key in assisting them in transforming from data guardians to data stewards.
Curt stated that many of the things that are restricted in Protected Data would otherwise be restricted by other laws in Massachusetts, including Massachusetts Public Records Law and HIPPA. Where the uncertainty lies is in the other data - data that could be released that has never been released, and gets defaulted to a place of "no" as opposed to a place of "yes." There are a lot of things that are open to interpretation.
Lisa Peterson stated that from an administrative perspective, to default to yes can be a mentality that is viewed by some as very threatening. Custodians of data are nervous about making data open. She noted the importance of a cultural shift when thinking about open data. Stating that all data is open may create a negative reaction to the policy. There needs to be some level of comfort as it relates to this ordinance. Ari agreed that there needs to be change in the mentality when approaching open data. Calvin stated that in his experience, when you put hurdles before people to classify something as protected data, they become more resistant to restricting data. When it is easy to say something is protected data, they can do it without thinking twice.
Curt stated that it comes down to changing the culture and the definition of what it means to be a data steward. If keeping data under lock and key because not doing so would be a hazard to public safety and welfare, that is one thing. It is another to not do so because of the fear of a story leaking in the press or a department being bombarded with calls asking about the data. Stewardship does not mean locking down everything. He stated that a steward should be responsible to act in the public interest in order to publish information. He stated that teaching people about their own data is an important part. Often times, they are unaware of what lies within their data set and what they can do with it. If the mentality is always to default to no, you will never get full value of your work and use it to its utmost potential. Ms. Peterson stated that it is not the intent of the City to give the impression that it is locking down data. She noted that the City wants to be thoughtful about what data sets are available.
Curt stated that once data is more regularly shared, there will be more feedback to use to determine which data sets are appropriate. Being a steward is about getting the staff involved and engaging with people in the community who want to use available data. It is an opportunity to bring the value of data full circle. It is important to make the data a partnership.
Harlan stated that just received a request from Launch Academy who would like to work with him on a civic project. He stated that he would love to utilize this group on a project for the City of Cambridge but noted that he needs to be properly armed to do this. This is another form of direct engagement with a developer.
Dan O'Brien stated that there is a lot of wiggle room for a City employee who wants to classify a certain data set as "protected." He asked what the role of the Open Data Review Board in determining what information is protected, and expressed his desire for the Board to play a mediating role. Rebecca noted that it is the intent of the ordinance for the Open Data Review Board to make determinations about what data should be protected and what must be released. Ms. Peterson stated that the City can be more explicit in the language relating to the Open Data Review Board and determining what is "protected" information.
Dan stated that there are many different fields within a database. Maybe the Open Data Review Board could be a facilitator in determining what fields can go and what fields cannot regarding the dissemination of information. There should be regular checks to ensure that protected data is not being released unintentionally.
Lisa Peterson spoke about rules and standards regarding information that should not be available.
Thad Kerosky stated that he would like to see language to define the role of data steward. That is a motivating factor behind the ordinance. Ms. Peterson stated that this is a good point.
Curt stated that if a department says "no" to everything, this will make for an adversarial atmosphere. Harlan stated that it must be a partnership between the Open Data Review Board representing the citizenship and the Department head representing the City. Calvin stated that citizens own the data.
Curt stated that Chicago has a departmental liaison that has had specific training of the laws and would have knowledge of their data. This is how Boston has dealt with their data portal. He has trained the departments which helps to maintain the comfort level. Providing training for department liaisons as to what can be published, what is prohibited, and how to classify the gray areas will make people feel more at ease and make qualified decisions.
Ari stated that additional details and clarification as it relates to the Open Data Review Board and the policies of the board such as timelines for review, etc. will alleviate staff concerns.
Councillor Cheung stated that the City is not trying to be overly prescriptive in the ordinance to allow the City Manager to make changes to the structure of the Board once we finds out what is most effective. Ari stated that if a City employee determines that a request for data falls into a gray area, it is important to know the structure of how the Open Data Review Board will move forward in determining if said data should be open or protected. Ms. Peterson agreed that it is important to be clear regarding the policies of the Open Data Review Board but she would not want that level of detail included in the ordinance because there are many steps when changes need to be made when amending an ordinance of the City of Cambridge. Ari stated clear structure alleviates anxiety around the process. Ari stated that if this information is not specified in the ordinance, the City should promulgate a process. Councillor Cheung stated that the process should be defined by the City Manager.
Calvin noted that the definition of CSV should be altered to include that it is a file type. JSON is another type that may be worth including. Rebecca stated that definition A for API and B for CSV had been included because they were previously used in the definition of the term machine readable. Because the definition of machine readable has been changed, these definitions are no longer necessary. Curt stated that as long as machine readable is covered, this should not be an issue. Calvin stated that it may be helpful to include language that would allow the Board to stipulate the most up-to-date and widely-used file formats. Rebecca stated that this language change could be included under Open Data Accessibility F, which allows the Open Data Review Board to determine granularity.
Kristen asked if it would be better to use the term structured text for machine readable.
Ari stated that another concern is that sometimes, although the data is machine readable, it is not human understandable. Matt Cloyd stated that MAPC database titles are limited to 80 characters, so when they are made public, a metadata file will be published as well so that people using the data set can best understand it. Including metadata would be helpful.
Harlan stated that some of the language in the ordinance is flexible and questioned what constitutes a "reasonable or best effort?" He asked how to evaluate whether someone has made their best effort. What is a "significant amount"? When these are used, how do you determine that these are grounds for protecting a data set? Mary Hart stated that a large determinant in making these decisions is what is financially prudent at the time, which often cannot be defined because it changes based on the budget of the City. Rebecca stated the Open Data Review Board would provide detailed responses as to why a certain data set was classified as protected due to undue administrative or financial burden. Similar to Freedom of Information Requests, Rebecca explained that it would be the lowest paid city employee capable of performing said research in addition to other potential costs. These costs would all be detailed at length.
Dan stated that one idea is if the board comes across a dataset that is undue burden, it might be good idea to consider engaging the hack or academic community to contribute to the process of making that data useable. Depending on content, you could find researcher who would like to take the time to help manage this. Curt stated that it is a good thing to empower employees to provide messy data. Rebecca stated that when determining which resident to select for the Open Data Review Board, it is her hope that the City Manager would consider the connection to the tech and civic innovation communities. Someone with knowledge would be beneficial in this respect.
Thad noted that the City should consider keeping the definition of API so that it can be used in language re: username and password in Public Data Access C.
Councillor Cheung asked about any other issues that should be areas of discussion.
Rebecca stated that Councillor Cheung is interested in submitting a communication in when the ordinance is submitted such as an "Open Data FAQ". She wants to be sure that the Council and the public has resources at their fingertips to understand what is being suggested by the ordinance. This communication could include questions including "What is open data? How is it used? Is it different from big data? What data is private?". She stated that can understand why the technical wording can be intimidating to people. Harlan stated that a lot of questions are answered on data.gov. Nick Doiron spoke about the Sunlight Foundation's ‘5 reasons not to open data', which provides informative details to demystify the five most common reasons people are afraid of opening data. They have good responses to questions.
Councillor Cheung stated that it would beneficial if informational meetings could be scheduled with City Councillors to provide information and answer potential questions that City Councillors may have. Dan added that it would be beneficial to have some of the examples of sites that CFB has built with open data as well as an active demonstration. Rebecca stated that she is happy to assist in coordinating many of these meetings.
Councillor Cheung submitted communications from Saul Tannenbaum and John Hawkinson to be added into the record (Attachments B and C).
Councillor Cheung stated that legal review of the ordinance is forthcoming and spoke of next steps for submittal of this ordinance to the City Council.
Harlan stated that he has asked for feedback and commentary on the draft ordinance from Code for America Captains across the country by providing them with the link to the GoogleDoc.
Councillor Cheung stated that he would like this ordinance to be in place by year end.
Councillor Cheung thanked all attendees for their participation.
The meeting adjourned at 8:28 p.m.
For the Committee,
Councillor Leland Cheung, Chair
Cable TV, Telecommunications and Public Utilities Committee