Description
This course is an introduction to data quality, particularly within the criminal justice system and corrections. Accurate and reliable data are crucial for evidence-based decision-making, accountability and transparency, and building and maintaining trust to guide policy and informed decisions. This course will provide the guidance necessary to effectively structure trustworthy and accessible data for high-quality analysis in corrections to inform agency leaders and policymakers.
Through this course, you will gain a comprehensive introduction to the fundamental components of data quality. Specifically, you will learn what data quality is and why it’s important, explore the four dimensions of data quality (intrinsic, contextual, accessibility, representational), and learn how to assess the quality of your data. This course offers a theoretical explanation of data quality, providing you with a deeper understanding of its foundational components.
By the end of this course, you will gain an understanding of the multiple dimensions of data quality and the best practices to maintain high-quality data. Since criminal justice data is a type of administrative data maintained as part of routine operations, this course will focus on ensuring data quality in administrative datasets. We recognize that many of these practices may require larger institutional changes that are in the hands of either the research director or the Department of Corrections (DOC) secretary/commissioner. Our goal in this course is to help you understand what comprises quality data so you can assess your data and make informed decisions about managing and using the data. We hope that by taking this course, you can apply these best practices in your work while being better equipped to talk to leadership and colleagues about ways to improve data quality.
Key Concepts
This course will focus on four interrelated dimensions of data quality that encapsulate the various stages of the data lifecycle to ensure high-quality data. We explain briefly what these dimensions are here and discuss how they are relevant for the DOC so you get an overview of what you will learn in this course.
Intrinsic Data Quality
This dimension refers to the characteristics of the data that make it accurate, reliable, credible, and impartial. This dimension is pivotal in ensuring quality data, as it covers the earliest stages of data collection and data management. For DOCs, maintaining intrinsic data quality ensures that the data in your system represents reality through standards of data governance and audits. Maintaining high intrinsic data quality builds trust and supports decision-making using your administrative data.
Contextual Data Quality
This dimension focuses on the appropriateness of data for its intended use. This is extremely important for DOCs’ data to ensure that your data system is tracking the information you need for tasks like reporting requirements, managing people convicted of crimes, and informing decision-making. It’s also important in cases when you’re asked to respond to requests for specific reports and ensure that you have data that allows you to appropriately conduct specific analyses when they arise.
Representational Data Quality
This dimension focuses on the understandability, interpretability, and consistency of your data. It ensures that users and stakeholders can understand the data, its nuances, and its limitations to make educated conclusions from the data. While some of this may depend on the users of the data and be outside of the DOC’s control, you can take steps to promote consistent interpretation by defining the data with as little ambiguity as possible and including clear definitions of measures. Maintaining representational data quality is important for DOCs to ensure that administrative data is not only accurate but useful and actionable for decision-making.
Accessibility Data Quality
This dimension focuses on the extent to which data are available and the ease with which authorized users can access them in ways that align with data security and safety standards. Not only is making data easily accessible important, but accessibility requires considering how easily users can manipulate and use the data. Maintaining accessibility is important for a DOC to protect sensitive data and ensure that it’s securely available for analyses by authorized users.
Goals and Objectives
By the end of this course, you should be able to do the following:
- Recognize the importance of maintaining data quality.
- Understand the dimensions of data quality and how they interconnect.
- Accurately use standard terminology across data quality dimensions.
- Identify best practices in data quality for successful future analyses and data sharing.
- Document via codebooks or data dictionaries existing data quality for internal and external partners.
- Explore, visualize, and summarize data using advanced methods and tools to assess and document data quality.
- Be able to develop strategies for improving data quality across different stages of the data lifecycle.
Prerequisites
There are no prerequisites for this course. This course is designed to be an introduction to the foundational elements of data quality; it does not require previous knowledge about data quality and its dimensions.
Structure
This course is organized into lessons that are meant to be completed in sequential order. Each lesson includes an explanation of the topic, relevant examples, and best practices that will be presented through a combination of written materials and videos. At the beginning of the course, we introduce a data quality assessment, which gives you an overview of what you would need to do to assess and improve the quality of the data in your system. After you finish the course, you can revisit the assessment to reinforce concepts covered in the course. Each lesson will also include a learning check to ensure you understand key concepts from the lesson before moving on to the next one.
Lesson |
Learning Objectives |
1: Introduction to Data Quality |
- Understand the definition of data quality.
- Learn key terms related to data quality.
- Understand why data quality is important.
- Learn about the dimensions of data quality.
|
2: Conducting a Data Quality Assessment |
- Understand how to comprehensively assess data quality.
- Assess data on all four dimensions of data quality.
- Use data exploration to summarize and assess data quality.
- Integrate data visualization into your data quality assessment.
- Understand and develop quality assurance policies to maintain data quality.
- Develop data documentation for internal and external use.
|
3: Intrinsic Data Quality |
- Learn the definition of intrinsic data quality.
- Recognize the importance of maintaining high intrinsic data quality.
- Understand the attributes of intrinsic data quality.
- Learn about best practices to maintain intrinsic data quality.
|
4: Contextual Data Quality |
- Learn the definition of contextual data quality.
- Recognize the importance of maintaining high contextual data quality.
- Understand the attributes of contextual data quality.
- Learn about best practices to maintain contextual data quality.
|
5: Representational Data Quality |
- Learn the definition of representational data quality.
- Recognize the importance of maintaining high representational data quality.
- Understand the attributes of representational data quality.
- Learn about best practices to maintain representational data quality.
|
6: Accessibility Data Quality |
- Learn the definition of accessibility data quality.
- Recognize the importance of maintaining high accessibility data quality that aligns with data safety and security standards.
- Understand the attributes of accessibility data quality.
- Learn about best practices to maintain accessibility data quality in line with security standards.
|
7: Missing Data |
- Understand what missing data is and how it occurs.
- Recognize the problems that arise when data is missing.
- Learn about different types of missing data and the implications of each type.
- Identify strategies to minimize the prevalence of missing data.
|
Estimated Course Length
10 hours