Using data ethically

Overview

Data literacy training for UW–Madison employees, faculty, and students to effectively and ethically use institutional data.

Learning objectives

Gain skills in using institutional data
Identify ways to document data transformations and analysis
Understand the key responsibilities and ethics of conveying data

Data, Information, and Knowledge

relationship between data information knowledge — Relationships Amongst Knowledge, Information, And Data from Liew, 2007 as illustrated by Medged, 2018

The distinction between data, information, and knowledge is a critical component of the Data Use stage. Carol Tenopir, as quoted in Zins, 2007 defines theses concepts as:

Data are facts that are the result of observation or measurement.
Information is meaningful data. Or data arranged or interpreted in a way to provide meaning.
Knowledge is internalized or understood information that can be used to make decisions.

Using data to create information or make decisions involves thinking critically, applying appropriate transformations and analytical techniques, considering bias, and conveying your interpretation of the data as information.

Ask the right questions

Not all questions can be answered with data. It’s important to form good questions when using data to make data-informed decisions. According to Barton Poulson in “Data Fluency: Exploring and Describing Data (2019),” using data involves taking a big picture view to ask:

What are the most common values in the data (averages)?
How much do things vary (ranges and outliers)?
Is there a cause and effect relationship (causation vs. correlation)?
What causes can you control (experimental controls)?

Get organized

Tips for staying organized when collecting data:

Label your data with descriptive and consistent file names that are easy to understand, even if you open your data several years from now. See examples of file naming schemas from UW–Madison Libraries Research Data Services
Organize your files in well defined folders. Avoid over-nesting. Shallow folder structures, only two or three levels deep are easier to browse.
Use tools to track versions of your data to make it easier to revert back to a previous version (tips on using version control). Google docs, Microsoft 365, and Box all have built-in version control to help track the changes made to each version of a file.
Document all changes made in your processing and handling of the data.

Five Ways to Document Your Data

How understandable will your data be to other users, including yourself in the future? This guide considers a few best practice for documenting data, from just getting started to more complex tools..

Cite your data source
Define your data using a data dictionary
Create a metadata package describing the entire dataset (e.g., data specification or readme file)
Track the lineage of your data using data models, source-to-target mappings, and all transformations
Capture the reproducible environment of all changes made to your data

Learn the five ways to document your data

Data analysis and computing tools

There are literally hundreds of ways to analyze data. Statistical analysis, computational analysis, reporting of aggregate or clustered responses, and exploratory visualization each provide ways to derive insights from your data. Many books, encyclopedias, and handbooks offer discipline-specific training on research methods, for example see Sage Research Methods available online through the UW Madison Library. One of the overarching goals of data analysis is to apply methods and/or use tools in ways that are well documented and reproducible.

Services available to UW–Madison users:

Center for High Throughput Computing (CHTC) – Access to big data computing resources are provided to all UW-Madison researchers, and most base services are free of charge.
Research Cyberinfrastructure – DoIT’s Research Cyberinfrastructure group supports data science platforms for research and several cloud computing services.
Social Science Computing Cooperative (SSCC) – Supports researchers at UW-Madison who use statistical analysis in their work.
UW–Madison Design Lab – Offers software, training, assistance, and inspiration for visualizing your data.

Tip: Download popular licensed software from the UW–Madison Campus Software Library.

Spreadsheets

Open Refine
Microsoft Excel
Google Sheets
Tableau

Statistical software

Qualtrics
Quantitative
- SPSS
- SAS
- STAT
Qualitative
- NVIVO
- ATLAS.ti

Programming languages

R: Many packages for data visualization
Python: Frequently used in machine learning and data science
SQL: Communicates with databases and data warehouses
Julia: Ability to call libraries from Python, C, and Fortran

Interactive data visualization

Tableau: interactive visualization
Piktochart: Infographics
ArcGIS: Maps

Convey Data

Consider how the data you collect and use will impact the people and communities that this data represents. This not only includes following all laws and local policies but also considering any negative repercussions that may result from using this data. Ask how my use of this data might impact your relationship with the individuals represented in the data.

Data ethics and misuse

The use of data should build trust and improve relationships. Lack of trust may result in low response rates or individuals providing incorrect or bad data.

Ethical gathering and use of data involves:

Consent: Permission to gather the data
Informed consent: A person must understand what they are consenting to
Voluntary consent: Consent is given freely and without coercion (e.g., you must provide this data in order to complete a class or do your job).
Anonymity: A person is not linked or identified by the data
Confidentiality: The data will not be shared with anyone without their consent

Reciprocity and Respect

The individuals providing data should be treated with respect and directly benefit from the data use. For example, historically marginalized communities have experienced negative impacts from data exploitation and misuse. This is one of the reasons why members of the global indigenous communities developed the CARE principles which recommend that Indigenous data and Indigenous knowledge be used for collective benefit, granting authority to control back to the indigenous group, and uphold responsibilities and ethics agreed on by all parties. One way to show reciprocity is to follow-up with data providers and share the summary results (in aggregate and deidentified) or other important findings from the data use. Basically, don’t leave people feeling exploited or unheard with a take-their-data-and-run approach.