Help:Managing Data Quality

(Difference between revisions)
Watchers
Revision as of 18:54, 21 May 2023 (edit)
DataAnalyst (Talk | contribs)
(Help:Data Quality renamed to Help:Managing Data Quality)
← Previous diff
Current revision (14:22, 22 May 2023) (edit)
DataAnalyst (Talk | contribs)

 
Line 1: Line 1:
-==Managing Data Quality==+==Finding Data Quality Issues==
-===Finding Data Quality Issues===+
WeRelate provides a few tools to help contributors find questionable information and/or duplicates so that they can improve the quality of WeRelate data. WeRelate provides a few tools to help contributors find questionable information and/or duplicates so that they can improve the quality of WeRelate data.
These tools include: These tools include:
-* A [[Help:Monitoring Data Quality|Data Quality Issues]] list that can be reviewed across the entire site or filtered to focus on pages of interest to the contributor. This list is updated daily.+* A '''[[Help:Monitoring Data Quality|Data Quality Issues]] list''' that can be reviewed across the entire site or filtered to focus on pages of interest to the contributor. This list is updated daily.
-* Automated "questionable information" messages appearing on Person and Family pages. These messages are the same as the Error and Anomaly messages on the Data Quality Issues list. These messages are produced in real time, so that contributors are notified of issues on pages they just added or changed.+* Automated '''"questionable information" messages''' appearing on Person and Family pages. These messages are the same as the Error and Anomaly messages on the Data Quality Issues list. These messages are produced in real time, so that contributors are notified of issues on pages they just added or changed.
:* A Person page displays messages specific to the Person page (e.g., events out of order) and the person's relationship to his/her parents (e.g. born after mother's death, born before parents' marriage). :* A Person page displays messages specific to the Person page (e.g., events out of order) and the person's relationship to his/her parents (e.g. born after mother's death, born before parents' marriage).
:* A Family page displays messages specific to the Family page (e.g., invalid date) and the involvement of the husband and wife (e.g., married after death of husband). It also displays messages for each child's relationship to their parents and the parents' marriage (e.g., born before mother was 4, born before parents' marriage). :* A Family page displays messages specific to the Family page (e.g., invalid date) and the involvement of the husband and wife (e.g., married after death of husband). It also displays messages for each child's relationship to their parents and the parents' marriage (e.g., born before mother was 4, born before parents' marriage).
-* A [[WeRelate:Duplicate pages patrol|Duplicate families]] report to identify potential duplicate families to either merge or mark as not being a duplicate. This list is used by volunteers who have signed up for the Duplicate pages patrol, and are comfortable with completing merges.+* A '''[[WeRelate:Duplicate pages patrol|Duplicate families]] report''' to identify potential duplicate families to either merge or mark as not being a duplicate. This list is used by volunteers who have signed up for the Duplicate pages patrol, and are comfortable with completing merges.
-===Addressing Data Quality Issues===+==Addressing Data Quality Issues==
Some data quality messages indicate an error (e.g., born after mother died), while others indicate an unusual but possible situation (e.g., older than 80 at marriage). Some data quality messages indicate an error (e.g., born after mother died), while others indicate an unusual but possible situation (e.g., older than 80 at marriage).
-If the data is incorrect, it should be fixed. As with any data entry on WeRelate, appropriate sources should be added to the page to support the data. In particular, don't assume that a typo in a year involves only the century or only the decade. For example, 1984 might be a typo for 1884 or for 1894.+'''If the data is incorrect''', it should be fixed. As with any data entry on WeRelate, support the correction with sources. In particular, don't assume that a typo in a year involves only the century or only the decade. For example, 1984 might be a typo for 1884 or for 1894.
-If the data is correct, it can be marked as such, which will prevent it from being displayed again. This should only be done once appropriate sources are added to prove that the data is correct. The Data Quality Issues list makes it easy to mark an issue as correct (verified). The templates below can also be used.+'''If the data is correct''', it can be marked as such, which will prevent the issue from being displayed again. This should only be done once appropriate sources are added to prove that the data is correct. The [[Help:Monitoring Data Quality|Data Quality Issues]] list makes it easy to mark an issue as correct (verified), which adds a template to the talk page. The templates (which are listed below) can also be added manually.
-If you can't determine whether or not the data is correct, you can mark it as "deferred", which will remind you that you looked at it already and couldn't (or chose not to take the time to) resolve it.+'''If you can't determine whether or not the data is correct''', you can mark it as "deferred", which will remind you that you looked at it already and couldn't (or chose not to take the time to) resolve it. The [[Help:Monitoring Data Quality|Data Quality Issues]] list makes it easy to mark an issue as deferred, which adds a template to the talk page. The template (listed below) can also be added manually.
 + 
 +==Frequently Asked Questions==
 +{{Help:FAQ/Managing Data Quality}}
 + 
 +==Data Quality Templates==
 +{{Help:Managing Data Quality/Templates}}

Current revision

Contents

Finding Data Quality Issues

WeRelate provides a few tools to help contributors find questionable information and/or duplicates so that they can improve the quality of WeRelate data.

These tools include:

  • A Data Quality Issues list that can be reviewed across the entire site or filtered to focus on pages of interest to the contributor. This list is updated daily.
  • Automated "questionable information" messages appearing on Person and Family pages. These messages are the same as the Error and Anomaly messages on the Data Quality Issues list. These messages are produced in real time, so that contributors are notified of issues on pages they just added or changed.
  • A Person page displays messages specific to the Person page (e.g., events out of order) and the person's relationship to his/her parents (e.g. born after mother's death, born before parents' marriage).
  • A Family page displays messages specific to the Family page (e.g., invalid date) and the involvement of the husband and wife (e.g., married after death of husband). It also displays messages for each child's relationship to their parents and the parents' marriage (e.g., born before mother was 4, born before parents' marriage).
  • A Duplicate families report to identify potential duplicate families to either merge or mark as not being a duplicate. This list is used by volunteers who have signed up for the Duplicate pages patrol, and are comfortable with completing merges.

Addressing Data Quality Issues

Some data quality messages indicate an error (e.g., born after mother died), while others indicate an unusual but possible situation (e.g., older than 80 at marriage).

If the data is incorrect, it should be fixed. As with any data entry on WeRelate, support the correction with sources. In particular, don't assume that a typo in a year involves only the century or only the decade. For example, 1984 might be a typo for 1884 or for 1894.

If the data is correct, it can be marked as such, which will prevent the issue from being displayed again. This should only be done once appropriate sources are added to prove that the data is correct. The Data Quality Issues list makes it easy to mark an issue as correct (verified), which adds a template to the talk page. The templates (which are listed below) can also be added manually.

If you can't determine whether or not the data is correct, you can mark it as "deferred", which will remind you that you looked at it already and couldn't (or chose not to take the time to) resolve it. The Data Quality Issues list makes it easy to mark an issue as deferred, which adds a template to the talk page. The template (listed below) can also be added manually.

Frequently Asked Questions

How do I remove a message from "Questionable information"?

When the data is correct

I've already checked the data and found it to be correct. How do I stop the "To check" message from being displayed?

First, make sure the page(s) cite sources to support the data.

The easiest way to suppress the message is to find it on the Data Quality Issues list and select the Verified button. Enter comments (such as "see birth records of both mother and child") and select OK.

Alternately, add the appropriate Data Quality template manually.

When I don't know how to fix the data

I've looked at the data but don't know how to fix it. Can I stop the "To fix" message from being displayed?

No, you can't stop the message from being displayed, but you can leave a record of the fact that you couldn't resolve it.

The easiest way to do this is to find the message on the Data Quality Issues list and select the Defer button. Enter comments (such as "requires a source I don't have access to") and select OK.

Alternately, add the DeferredIssues template manually.

When you come back to this page, before reviewing questionable information again, take a look at the talk page, which will let you know that you already reviewed the issues on this page.

Why don't "Questionable information" messages show up when I look at the history of a page?

The logic to determine issues as of a point in time in the past is complicated and it was deemed unnecessary to put the effort into making this work.

Data Quality Templates

When there is proof that the data is correct

The following templates suppress automated "Questionable information" messages, and are intended for use only after establishing source documentation to support the relevant data.

TemplateMessage it suppresses
Template:BirthBeforeParentsMarriage Born before parents' marriage
Template:BirthLongAfterParentsMarriage Born over 35 years after parents' marriage
Template:BaptismAfterMothersDeath Christened/baptized after mother died
Template:BaptismWellAfterFathersDeath Christened/baptized more than 1 year after father died
Template:UnusuallyYoungMother Born before mother was 12
Template:UnusuallyYoungFather Born before father was 15
Template:UnusuallyOldMother Born after mother was 50
Template:UnusuallyOldFather Born after father was 70
Template:UnusuallyYoungWife Wife younger than 12 marriage
Template:UnusuallyYoungHusband Husband younger than 12 at marriage
Template:UnusuallyOldWife Wife older than 80 at marriage
Template:UnusuallyOldHusband Husband older than 80 at marriage

When an issue hasn't been resolved

The following template can be used to remind yourself that you looked at a data quality issue and chose not to or were unable to resolve it at that time.

Template:DeferredIssues