Data Management 101

Overview

This guide introduces key data management concepts and offer best practices for data management, from creation to destruction, including how to store, assure, secure, and monitor data at UW-Madison.

Audience: UW–Madison staff, faculty, and students

This is part of the Data Literacy & Training series.

<< Previous: Planning for Data Management
>> Next: Using Data Ethically

 

Learning Objectives

  1. Identify and compare options for secure data storage on campus
  2. Become aware of data integrity issues and mitigation techniques
  3. Understand basic data retention and archival best practices

Data Management

data collection stage involves store assure secure monitor

At UW–Madison, many projects require some aspect of data collection including:

  • Data collected from others, such as survey data or student grades
  • Data as a part of your research or scholarship (e.g., collected in the lab or extracted from text or archival records)
  • Data entered into systems, such as student profiles, employee records, or financial transactions

Data management is a collection of practices, concepts, and processes that help maintain data integrity, quality, security, and usability throughout the data lifecycle. Data management best practices, policies, and tools help us apply the appropriately steward data from the moment it is created through the last day it is retained.

This is an accordion element with a series of buttons that open and close related content panels.

Data Collection

Various data collection techniques exist. The two most common methods are quantitative and qualitative data collection. 

Quantitative data collection methods often involve some form of structured data collection (e.g., instrument readings) or random sampling. Data are collected to test a hypothesis and some methods examples include:

  • Experiments/clinical trials
  • Recording observable events (e.g., counting the number of tree rings)
  • Surveys with fixed responses and well structured interviews

Qualitative data collection methods are often more unstructured and open-ended. And while the results may not be as generalizable, they offer new pathways the researcher can use to refine or strengthen quantitative data approaches. Examples include:

  • Open ended survey questions or pilot surveys
  • Interviews
  • Document review

Data Storage and Backup

Collected or acquired data must be stored in a stable and secure environment. This involves continuous monitoring for hardware failure, controlled access with password protection, and robust backup and recovery services to avoid permanent data loss or destruction. UW–Madison has a range of secure data storage options managed by DoIT and in the cloud (e.g., Box, Google Drive).

Tip: Use the Data Storage Finder tool to compare and select the best campus data storage option for your needs: https://storage.researchdata.wisc.edu/

Data Inventory

A data inventory, or a data catalog, helps us understand what data we must manage. Storage providers monitor their assets using automated tools for scanning a storage device and creating a report of all contents and their attributes including file name, file format, size, data classification (sensitivity level), data type (e.g. regulations) and dates of creation and modification.

There are several ways to get an output of what files you have.

  • Use the “tree” command to print out the contents of a directory from the command line
  • Use Karen’s Directory Printer for a tool-based approach to data inventories
  • Use more advanced forensic tools like Identity finder to also search within the files

Tip: Try searching RADAR to discovery UW Madison data products, dashboards and reports.

Data Assurance

Data quality assurance and data quality control (QA/QC) are strategies for (according to DataOne)

  • Preventing errors from entering a dataset
  • Ensuring data quality for entered data
  • Monitoring and maintaining data quality throughout the project

High-quality data is, broadly speaking, data that is fit for use. Specific metrics or dimensions of data quality, such as the completeness, accuracy, or consistency of the data, must first be defined in order to evaluate data quality.

Tip: The UW-Madison Institutional Data Policy requires that “The quality and integrity of institutional data and institutional data products shall be actively managed and explicit criteria for data validity, availability, accessibility, interpretation, and ease of use shall be established.”

A related concept is data integrity. Once high quality data have been established, then the data must be protected from unauthorized alteration, deletion, or addition in order to maintain data integrity.

Data quality controls are generally based around the data quality dimensions chosen by the organization. For example, one common business rule is definitional conformance, meaning that data definitions should be the same throughout an institution. This is supported at UW-Madison through the use of common definitions, such as those found in the Glossary of Terms in Administrative Dashboards and Reports. Data quality and integrity are also supported through the documentation of data elements and datasets based on the Institutional Data Documentation Standard. Other common types of business rules for data quality include those related to missing data values, correct format and range, consistency, accuracy, and the timeliness of data accessibility and availability.

Tip: Data quality issues in UW Madison institutional data can be reported through the Institutional Data Issue Submission Form.

Data Security

The level of protection or access restrictions placed on our institutional data often depends on the potential risk posted to the University and its affiliates due to unauthorized disclosure, alteration, loss or destruction of that data. UW–Madison’s data classification categories are:

Public: Little or no risk. Data should be classified as Public prior to display on the web or published without access restrictions

Internal: Low to moderate risk. By default, all Institutional Data that is not explicitly classified as Restricted, Sensitive or Public data should be treated as Internal data

Sensitive: Moderate to high risk. Data should be classified as Sensitive if the loss of confidentiality, integrity or availability of the data could have a serious adverse effect on university operations, assets or individuals

Restricted: High risk. Data should be classified as Restricted if protection of the data is required by law or regulation or UW–Madison is required to self-report to the government and/or provide notice to the individual if the data is inappropriately accessed

Tip: The University manages restricted data according to our Restricted Data Security Management policy.

Examples of Restricted Data

  • A staff member within the department of Academic Planning and Institutional Research submits enrollment totals to the Higher Learning Commission
  • A department administrator is looking at student counts by course for department planning purposes
  • A staff member within University Health Services is scheduling a clinic appointment for a student
  • A staff member within Rec Sports is renting a student a locker and looks up the student record within their local application
  • A department administrator is tracking Teaching Assistant information for the TAs within their department
  • A faculty member or lecturer is looking at a class roster
  • A school/college IT application developer is building an application to assist their department in tracking information about the students within that school/college
  • A principal investigator is working on an approved research project that involves UW–Madison students and data collected about those students

Tip: UW–Madison’s Data Classification policy provides a full list of sensitive and restricted data examples.

Monitor Overtime

How often data is used can be an important indicator of its value. Data retention should be based around local policy and anticipation of data’s usefulness over time. Data use can be assessed by analyzing file properties such as date last modified, date last opened, or through more sensitive monitoring techniques such as download logs.

Choosing what data to keep is an important and often overlooked step in the data lifecycle. Some data may require indefinite preservation which can be costly and pose risks such as unauthorized access or loss of integrity. Data retained by UW–Madison as a public institution may be subject to Wisconsin Public Records laws.

Tip: Learn more about public records requests from the UW–Madison Office of Compliance.

Only keep data that are necessary for future use or required by law. One way to determine how long to keep data is to consult with an archivist who can help create a records retention schedule.

Tip: UW–Madison Archives can help units and staff apply existing schedules or create new ones.

Key Data Management Terms

Data management is a collection of practices, concepts, and processes that help maintain data integrity, quality, security, and usability throughout the data lifecycle.  The DAMA International’s Data Management Body of Knowledge 2nd edition (DAMA-DMBOK2) defines 11 data management concepts as:

  • Data Governance: Planning, oversight, and control over management of data and the use of data and data-related resources
  • Data Architecture: The overall structure of data and data-related resources as an integral part of the enterprise architecture
  • Data Modeling and Design: Analysis, design, building, testing, and maintenance
  • Data Storage and Operations: Structured physical data assets storage deployment and management
  • Data Security: Ensuring privacy, confidentiality and appropriate access
  • Data Integration and Interoperability: Acquisition, extraction, transformation, movement, delivery, replication, federation, virtualization and operational support
  • Reference and Master Data: Managing shared data to reduce redundancy and ensure better data quality through standardized definition and use of data values
  • Data Warehousing and Business Intelligence (BI): Managing analytical data processing and enabling access to decision support data for reporting and analysis
  • Document and Content: Storing, protecting, indexing, and enabling access to data found in unstructured sources (e.g., electronic files and physical records), and making this data available for integration and interoperability with structured (database) data.
  • Metadata: Collecting, categorizing, maintaining, integrating, controlling, managing, and delivering metadata
  • Data Quality: Defining, monitoring, maintaining data integrity, and improving data quality
Source: Cupoli, Patricia, S. Earley, and D. Henderson. “DAMA International’s  Data Management Body of Knowledge 2nd edition (DAMA-DMBOK2)”. DAMA International (2017).