Help:Managing Data Quality

(Difference between revisions)
Watchers
Revision as of 18:53, 21 May 2023 (edit)
DataAnalyst (Talk | contribs)

← Previous diff
Revision as of 18:54, 21 May 2023 (edit)
DataAnalyst (Talk | contribs)
(Help:Data Quality renamed to Help:Managing Data Quality)
Next diff →

Revision as of 18:54, 21 May 2023

Managing Data Quality

Finding Data Quality Issues

WeRelate provides a few tools to help contributors find questionable information and/or duplicates so that they can improve the quality of WeRelate data.

These tools include:

  • A Data Quality Issues list that can be reviewed across the entire site or filtered to focus on pages of interest to the contributor. This list is updated daily.
  • Automated "questionable information" messages appearing on Person and Family pages. These messages are the same as the Error and Anomaly messages on the Data Quality Issues list. These messages are produced in real time, so that contributors are notified of issues on pages they just added or changed.
  • A Person page displays messages specific to the Person page (e.g., events out of order) and the person's relationship to his/her parents (e.g. born after mother's death, born before parents' marriage).
  • A Family page displays messages specific to the Family page (e.g., invalid date) and the involvement of the husband and wife (e.g., married after death of husband). It also displays messages for each child's relationship to their parents and the parents' marriage (e.g., born before mother was 4, born before parents' marriage).
  • A Duplicate families report to identify potential duplicate families to either merge or mark as not being a duplicate. This list is used by volunteers who have signed up for the Duplicate pages patrol, and are comfortable with completing merges.

Addressing Data Quality Issues

Some data quality messages indicate an error (e.g., born after mother died), while others indicate an unusual but possible situation (e.g., older than 80 at marriage).

If the data is incorrect, it should be fixed. As with any data entry on WeRelate, appropriate sources should be added to the page to support the data. In particular, don't assume that a typo in a year involves only the century or only the decade. For example, 1984 might be a typo for 1884 or for 1894.

If the data is correct, it can be marked as such, which will prevent it from being displayed again. This should only be done once appropriate sources are added to prove that the data is correct. The Data Quality Issues list makes it easy to mark an issue as correct (verified). The templates below can also be used.

If you can't determine whether or not the data is correct, you can mark it as "deferred", which will remind you that you looked at it already and couldn't (or chose not to take the time to) resolve it.