Documents/EPAO/2: Transparency/II.B: Data.gov

II.B: Data.gov

Increase and improve EPA information on Data.gov

Other Information:

The OGD sharpens the focus on public access to data, making data sets from across the federal government available to the public. To improve the public’s ability to discover, access and use data sets, the White House launched Data.gov, which provides links to data sets and data tools from multiple departments and agencies. The evolution of the Internet and its ability to reach a wider audience means that federal agencies have an opportunity to reach many more people. In conjunction with the Data.gov launch, EPA began making high-value data sets available to the public via www.Geodata.gov and through EPA’s DataFinder tool (www.epa.gov/datafinder). EPA has strategically provided data sets to non-governmental groups since the 1980s. An example is the Clean Air Status and Trends Network (CASTNET), a regional long-term environmental monitoring program that assesses trends in acidic deposition due to emission reduction regulations, such as the Acid Rain Program and the NOx Budget Trading Program. Data from CASTNET can be downloaded from (www.epa.gov/CASTNET/data.html). In addition to making more data available to more people, we are increasing the speed at which we make data available. In response to stakeholder needs for earlier availability of environmental data, the TRI program processed and posted 88,000 toxic chemical reports within 49 days of collection in an easy-to-use downloadable file format on our Web site, publishing data before we completed our analysis in an effort to increase transparency. This encouraged outside stakeholders to conduct their own analyses within weeks of data collection. We set a record with the 2009 data by publishing it in the same year it was collected for the first time ever. To inventory all data sets for future inclusion on Data.gov, we have identified the following principles to prioritize data. Our highest priority will be data which: * Advances one or more of the Agency’s strategic goals/priorities * Responds to the feedback received on www.epa.gov/open and www.data.gov, and inspires new forms of community engagement. * Enables third party innovation by conforming to established best practices for data: * Primary—exposes the underlying source data not aggregate statistics. * Structured—available in a machine-processable format such as XML, CSV or other. * Timely—includes the most recent data available and is updated on a regular basis. * Usable—provides an understandable description of the dataset, its context and makes available the data schema and other relevant metadata. * Complete—includes all collected data of this type as described, except where constrained by privacy or legal barriers. * Quality—of appropriate and well-described quality for informed use by other parties Sections IV.B.3 and IV.B.4 provide details on how EPA data is already being used in innovative ways by the public, and how we have responded to recent data requests. 1. Identification and Publication of High-Value Information - EPA has taken full advantage of Data.gov, linking to more than 400 data sets and data tools to date that can be accessed on EPA’s Web site. We are developing a Strategic Data Action Plan to establish and implement EPA’s processes to increase transparency by more systematically managing and disseminating information. The plan will establish governance mechanisms, processes and technologies to institutionalize the requirements of the OGD and this plan as they pertain to our data sets and tools. Components of the action plan will include: * Inventorying EPA’s high-value information currently available for download. (Note: The first version of this inventory, which includes 427 raw datasets, 37 tools, and 147 Geodata sets, can be found at: http://www.data.gov/catalog/raw/category/0/agency/4/filter/type/sort/page/1/count/25. * Identifying high-value information not available for download that should be considered for dissemination. * Prioritizing and scheduling new high-value information to post for download. * Soliciting ongoing public suggestions and feedback. * Improving the presentation of data in open formats to improve the public’s ability to use the data. * Improving the use of additional approaches such as web services, Application Programming Interfaces (APIs), linked open data/semantics and descriptive metadata to improve service to the public. * Identifying key information gaps where useable information is not available. * In addition, we want your input to drive our next set of releases – please go to www.epa.gov/open and provide us with your feedback. We will review these requests and will publish additional data sets on a quarterly basis. We want you to drive our priorities. We will examine how to use or adapt existing mechanisms such as the EPA Science Inventory and the EPA Data Finder to identify and manage high-value information. Where appropriate, the Strategic Data Action Plan will be integrated with EPA’s Strategic Plan, capital investment planning process and budget formulation process. We will make at least five new high-value data sets available on Data.gov in FY2010 and five more in FY2011. We will also seek and consider public input on the types of data sets and data tools that may be of value outside the Agency. To assist in determining what the public is most interested in, we will monitor entries in ―suggest data sets‖ from Data.gov, suggestions from the OpenEPA Web site and other sources. EPA will also consider contests to encourage interest and suggestions. 2. Timely Publication of Open Formatted Data - EPA’s goal is to improve both the quality and the quantity of the data sets we contribute to Data.gov. We will release the following five data sets in Q3/Q4 of FY2010, and each data set supports a key EPA priority, specifically to improve air quality, protect our water, and/or work to decrease the effects of climate change. * NHDPlus - EPA, assisted by the US Geological Survey, supported the development of NHDPlus to enhance watershed planning and analysis. NHDPlus is an integrated suite of application-ready geospatial data sets that incorporate many of the best features of the National Hydrography Dataset (NHD), the National Elevation Dataset (NED), the National Land Cover Dataset (NLCD), and the Watershed Boundary Dataset (WBD). The integration of National data sets provides users with the framework and tools to support a wide variety of water-related applications used for strategic decision making. * Clean Water State Revolving Fund (CWSRF) - Congress created the Clean Water State Revolving Fund (CWSRF) program in 1987 to serve as a long-term funding source for projects that protect and restore the Nation’s waters. During the last two decades, the CWSRF has provided low-interest loans targeting a wide range of projects in areas like wastewater treatment, non-point source pollution control, estuary management, and a host of projects focusing on water quality. It is the largest federal funding program for wastewater infrastructure projects across the country. * Drinking Water State Revolving Fund (DWSRF) - The Safe Drinking Water Act, as amended in 1996, established the Drinking Water State Revolving Fund to make funds available to drinking water systems to finance infrastructure improvements. We must make significant investments to our Nation’s water systems to install, upgrade, or replace infrastructure to continue to ensure the provision of safe drinking water to 240 million customers. Installation of new treatment facilities can improve the quality of drinking water and better protect public health. Improvements are also needed to help those water systems experiencing a threat of contamination due to aging infrastructure systems. The program also emphasizes providing funds to small and disadvantaged communities and to programs that encourage pollution prevention as a tool for ensuring safe drinking water. * The American Recovery and Reinvestment Act (ARRA) of 2009 provided the CWSRF and DWSRF programs with billions of dollars to fund high priority wastewater and drinking water infrastructure improvement projects. In support of ARRA, CWSRF and DWSRF are working on publishing the underlying grant data supporting ARRA projects to ensure transparency and accountability over public tax dollars. * Integrated Climate and Land Use Scenarios (ICLUS) - Climate change interacts with existing and future land uses, such as residential housing and roads. Up to now, there have been no scenarios of land-use changes for the U.S. that are consistent with the storylines of population growth, greenhouse-gas emissions, and socio-economic changes used by climate-change modelers. The lack of these consistent scenarios has impeded progress of integrated assessments of climate and land-use change on endpoints of concern, such as water quality, aquatic ecosystems, air quality, and human health. The ICLUS scenario data depict anticipated future patterns in housing density and impervious surface across the United States from 2000 to 2100, by the decade. * Green Vehicle Guide – EPA’s Green Vehicle Guide provides vehicle ratings based on emissions and fuel economy. The downloadable data for the current model year are available in text or spreadsheet (XLS) formats, and a data extraction tool is available for model years 2001 through current. And to continue expanding what is available on Data.gov, our Strategic Data Action Plan will address how we will evaluate and select the underlying (supporting) data that will be made publicly available in downloadable, open formats and catalogued in Data.gov. The plan will explain the process we will develop for making those data sets available. For example, we will improve our data management through a publicly accessible data set catalogue that will support Data.gov as well as any other sources for accessing the data sets, such as Data Finder. We will make other information available to increase the usability of our data sets, including definitions for the fields in the data sets and information about services, such as APIs, that can be used on the data sets. This plan will be available in FY2011. Specific milestones include the following: * Make 5 additional high value data sets available (Q3/Q4, FY 2010) * Publish the Strategic Data Action Plan (Q2, FY2011) * Make 5 additional high value data sets available (Q3/Q4, FY 2011) * Define processes to identify innovative uses of data (Q4, FY2010 & Q1 FY2011)

Indicator(s):