Data Management Plan (Deprecated)
GUID: gov.noaa.nmfs.inport:27377 | Published / External
This is an outdated version of the NOAA Data Management Plan template. InPort now supports a dedicated Data Management Plan Catalog Item type, which is up-to-date with the latest NOAA DMP template. The ability to generate Data Management Plans from Data Sets will be discontinued in a future release. Please see the Data Management Plan Help Guide to learn more.
Data Management Plan
DMP Template v2.0.1 (2015-01-01)
Please provide the following information, and submit to the NOAA DM Plan Repository.Reference to Master DM Plan (if applicable)
As stated in Section IV, Requirement 1.3, DM Plans may be hierarchical. If this DM Plan inherits provisions from a higher-level DM Plan already submitted to the Repository, then this more-specific Plan only needs to provide information that differs from what was provided in the Master DM Plan.
1. General Description of Data to be Managed
A Raster having 20 m resolution with decimal values was assembled from 18.6 billion bathymetric soundings that were obtained from the National Center for Environmental Information (NCEI) https://www.ncei.noaa.gov. Bathymetric soundings extends from Kuril-Kamchatka Trench in the Bearing Sea along the Aleutian Trench to the Gulf of Alaska, and in the Arctic Ocean from Prince Patrick Island to the International Date Line. Bathymetric soundings were scrutinized for accuracy using statistical analysis and visual inspection with some imputation. Editing processes included: deleting erroneous and superseded values, digitizing missing values, and referencing all data sets to a common, modern datum.
Notes: Only a maximum of 4000 characters will be included.
Notes: Data collection is considered ongoing if a time frame of type "Continuous" exists.
Notes: All time frames from all extent groups are included.
Alaska and surrounding waters
Notes: All geographic areas from all extent groups are included.
(e.g., digital numeric data, imagery, photographs, video, audio, database, tabular data, etc.)
(e.g., satellite, airplane, unmanned aerial system, radar, weather station, moored buoy, research vessel, autonomous underwater vehicle, animal tagging, manual surveys, enforcement activities, numerical model, etc.)
2. Point of Contact for this Data Management Plan (author or maintainer)
Notes: The name of the Person of the most recent Support Role of type "Metadata Contact" is used. The support role must be in effect.
Notes: The name of the Organization of the most recent Support Role of type "Metadata Contact" is used. This field is required if applicable.
3. Responsible Party for Data Management
Program Managers, or their designee, shall be responsible for assuring the proper management of the data produced by their Program. Please indicate the responsible party below.
Notes: The name of the Person of the most recent Support Role of type "Data Steward" is used. The support role must be in effect.
4. Resources
Programs must identify resources within their own budget for managing the data they produce.
5. Data Lineage and Quality
NOAA has issued Information Quality Guidelines for ensuring and maximizing the quality, objectivity, utility, and integrity of information which it disseminates.
(describe or provide URL of description):
Lineage Statement:
-Currently, our process keeps a record of the survey from which that point originated. Some older data does not have this level of metadata.
Servers are crawled for relevant data, and a list of download URLS to data files is returned. Data files are then retrieved using custom python scripts.
Raw data is downloaded from online NCEI web servers.
Data is converted to CSV or XYZ files
Points missing one or more of their XYZ points values are removed and archived.
Data is evaluated as a component of the bathymetry map to identify outliers (instances where data point(s) are not consistent with the expected variability of the surrounding environments) using a variety of statistical and manual methods.
K-Natural Neighbors (KNN) - Python and SciPy
Percentiles with Standard Deviations - Python and SciPy
Slope and Neighbors - ArcGIS Models
Manual/visual selection Human
Imputation of points to make them consistent with surrounding terrain tracklines and satellite altimetry.
Upon integration into the dataset, between 0-25% of the deepest and shallowest points are immediately removed from each trackline based on the StdDev of each induvial trackline.
All data is archived; a dataset with the outliers could be built within 7 weeks.
The percentage of points removed (R) is determined by a non-linear function of the datasets standard deviation (s), and can be seen below:
R=-4.746813 +30.059/(1+(s/9.584625)^0.9983 )
This function was derived using a best-fit curve tool, which was instructed to return a naturally-logarithmic function which was equal to ~25 when s=0, and which decreased asymptotically to 0 as s grew larger. The function was then tailored to have what the developer felt was a reasonable slope
The logic here is that datasets with low standard deviations would be relatively flat and featureless. Since they have a lower level of topographic complexity, they can undergo a higher rate of removal while still retaining the essential topographic character of the surface they represent.
Data is then organized into a Kd-Tree structure in which data points are organized based on their values with the data sorted between levels in the tree (i.e., the first level is split along the x axis, the next level is split along the y axis, the next along the z axis, and then the fourth along the x axis again. The result was a tree which can be searched in O (log(n)) time, and which was optimized for quick spatial searches, critical for the next step.
K-Nearest-Neighbors (KNN) statistical model is used on the data. This uses the value of each data point’s K spatially nearest neighbors (k-value) to produce an ‘expected value.’ The expected value is then subtracted from the point’s observed value, and the absolute value of this difference is the point’s ‘residual.’ After calculating the residuals for all of the points in the data set, we remove the 5% of points with the highest residuals.
After these steps are completed, the remaining data are converted to Feature Classes. This data structure is composed of not only the raw data, but also a host of metadata calculated from the raw data, such as vessel name and tracklines number. Spatial indexes are added to the data to optimize operations. All internal data is point data and is stored in Alaska Albers project.
ArcMap and Arc Pro Slope and Neighbor Outlier Tools.
The ArcMap and ArcPro function analyzes each data point based on the slope of the rendered terrain polygon and the point’s immediate adjacent neighbors. If a sufficient portion of these slopes exceeded a manually pre-defined threshold, the point is flagged as a potential outlier but not removed. After this function has identified all potential outliers, the set is visually reviewed and flagged. Flagged points are manually removed from the terrain but stored as an independent shapefile and thus no data removed from the active dataset were truly deleted.
(describe or provide URL of description):
Used K-natural neighbors, Percentiles, and ArcGIS slope tools to location and remove outliers.
6. Data Documentation
The EDMC Data Documentation Procedural Directive requires that NOAA data be well documented, specifies the use of ISO 19115 and related standards for documentation of new data, and provides links to resources and tools for metadata creation and validation.
Missing/invalid information:
- 1.7. Data collection method(s)
- 7.2. Name of organization of facility providing data access
(describe or provide URL of description):
7. Data Access
NAO 212-15 states that access to environmental data may only be restricted when distribution is explicitly limited by law, regulation, policy (such as those applicable to personally identifiable information or protected critical infrastructure information or proprietary trade information) or by security requirements. The EDMC Data Access Procedural Directive contains specific guidance, recommends the use of open-standard, interoperable, non-proprietary web services, provides information about resources and tools to enable data access, and includes a Waiver to be submitted to justify any approach other than full, unrestricted public access.
via REST Services. Not for navigation. Analysis only.
Notes: The name of the Organization of the most recent Support Role of type "Distributor" is used. The support role must be in effect. This information is not required if an approved access waiver exists for this data.
Notes: This field is required if a Distributor has not been specified.
https://alaskafisheries.noaa.gov/arcgis/rest/services/bathy_40m/MapServer
https://alaskafisheries.noaa.gov/arcgis/rest/services/bathy_40m/MapServer
Notes: All URLs listed in the Distribution Info section will be included. This field is required if applicable.
By Raster format download
NA
Notes: This field is required if applicable.
8. Data Preservation and Protection
The NOAA Procedure for Scientific Records Appraisal and Archive Approval describes how to identify, appraise and decide what scientific records are to be preserved in a NOAA archive.
(Specify NCEI-MD, NCEI-CO, NCEI-NC, NCEI-MS, World Data Center (WDC) facility, Other, To Be Determined, Unable to Archive, or No Archiving Intended)
Notes: This field is required if archive location is World Data Center or Other.
Notes: This field is required if archive location is To Be Determined, Unable to Archive, or No Archiving Intended.
call or email or visit web
Notes: Physical Location Organization, City and State are required, or a Location Description is required.
Discuss data back-up, disaster recovery/contingency planning, and off-site data storage relevant to the data collection
NA
9. Additional Line Office or Staff Office Questions
Line and Staff Offices may extend this template by inserting additional questions in this section.