Help:Monitoring Data Quality

(Difference between revisions)
Watchers
Revision as of 23:00, 11 June 2022 (edit)
Test2 (Talk | contribs)
(Filtering the list)
← Previous diff
Revision as of 23:00, 11 June 2022 (edit)
Test2 (Talk | contribs)
(Filtering the list)
Next diff →
Line 38: Line 38:
* If you select ''check'' beside a [[Help:My Trees|MyTree]] on the Manage My Trees page, the list reflects that MyTree. * If you select ''check'' beside a [[Help:My Trees|MyTree]] on the Manage My Trees page, the list reflects that MyTree.
* If you select ''Data Quality patrol'' from the Volunteer portal, the list shows issues across the entire database. * If you select ''Data Quality patrol'' from the Volunteer portal, the list shows issues across the entire database.
-In all cases, when you first open the list, verified anomalies are excluded.+* In all cases, when you first open the list, verified anomalies are excluded.
In addition, '''you can filter''' the list by: In addition, '''you can filter''' the list by:

Revision as of 23:00, 11 June 2022

Contents

About the Data Quality Issues page

WeRelate allows you to check for possible errors in your data by visiting the Data Quality Issues page.

Eventually the Data Quality Issues page will also support the Data Quality Patrol function - routine monitoring to catch errors such as typos so they can be fixed while the contributor is focused on that part of the tree. This routine monitoring will become feasible once the backlog of existing errors and anomalies is reduced. Please consider volunteering to address the backlog to make this possible.

Description and definitions

  • Data quality issues are identified by a job that runs periodically. The Data Quality Issues page shows the results from the last time the job was run. The run date/time is displayed at the top of the page. Note that the data is as much as 12 hours older than the run date/time due to the timing of the processing.
  • You cannot request a real-time issue check. If you just added or changed some data, you'll have to wait for the next run (or even the one after) to check for issues.
    • If you check an issue and the data doesn't match the message, check the page history to see if someone else fixed the issue within the last day or so.
  • Issues may be:
    • Anomalies - situations that are unusual enough to warrant review but might be correct, such as a person who married at age 6 or a person who was born before their parents were married
    • Errors - situations that are not correct, such as a person who married after they died, or a person who was born before a parent was born
    • Incomplete data - situations where minimal data about a person, such as gender, is missing
  • Note
    • Situations where sources are missing or incomplete might be added to this list in the future (or possibly a separate list)

Interacting with the list

Click the links on the list to see and correct issues. In addition, you can:

  • Mark an anomaly as verified by clicking the "Verified by me" button. This means that you have reviewed the situation and determined that the data is correct - for example, that a person was truly born before their biological parents married (i.e., was not from a previous marriage/relationship of one of the parents).
    • Before marking an issue as verified, ensure that the page (or a related page, such as the family page) has the sources that prove the information to be correct.
    • When you select the "Verified by me" button, a template is added to the Talk page of the indicated Person or Family page (the Talk page will be automatically created if it doesn't already exist). This template identifies you and the date you clicked the "Verified by me" button. Others will see this information when they open the Talk page.
  • Defer an issue by clicking the "Defer" button. This allows you to track issues that you are not prepared to address just yet or maybe ever.
    • For example:
      • Maybe you are working on your own project but choose to clean up a few issues each day, and are looking for "low-hanging fruit" such as simple date typos. You might want to defer larger problems such as a page that conflates 2 individuals until you are prepared to devote the time required for the necessary research.
      • Maybe you need to ask a family member for the correct data and are waiting for a reply.
      • Maybe you don't have the necessary expertise or access to sources to resolve the issue.
      • Maybe there are conflicting sources (such as one source saying the christening date was 6 Apr 1635 and another source saying the birth date was 3 Apr 1636), and you don't believe the issue is resolvable unless new sources are found. You might make a judgment call that the issue (event before birth) can simply be ignored.
    • When you select the "Defer" button, a template is added to the Talk page of the indicated Person or Family page (the Talk page will be automatically created if it doesn't already exist). You will have an opportunity to add a comment (e.g., "conflated persons", "waiting for a reply", "conflicting sources; issue can be ignored"). The template identifies you and the date you clicked the "Defer" button, and includes the comment. Others will see this information when they open the Talk page.

Why the "defer" button?

The "defer" button allows users to keep track of issues they choose to ignore for now so that they can optimize their data correction efforts. Additionally, it is intended to ensure that users don't mark anomalies as "verified" simply to get them off the list, which can be tempting. This is an opportunity to say "I don't know whether or not the data is correct" but still track that the issue was looked at. Maybe someone else will be able to resolve the issue, or maybe the issue will remain unresolved due to conflicting or limited sources.

Filtering the list

The list is automatically filtered when you first open it:

  • If you select Data Quality Issues from the My Relate menu, the list reflects your watchlist.
  • If you select check beside a MyTree on the Manage My Trees page, the list reflects that MyTree.
  • If you select Data Quality patrol from the Volunteer portal, the list shows issues across the entire database.
  • In all cases, when you first open the list, verified anomalies are excluded.

In addition, you can filter the list by:

  • category (anomalies, errors, incomplete data)
  • century of birth year as stated on the Person page (keeping in mind it might be incorrect) - this will restrict the list to messages associated with Person pages

You can also choose to include verified anomalies.

  • If you choose to include verified anomalies, the list will indicate who verified each anomaly. An anomaly can be verified by more than one user - in fact, a second set of eyes can increase the reliability of the data, since everyone makes mistakes at some point.

If you are signed in, you can switch between showing one MyTree, your entire watchlist, or the entire database:

  • For performance reasons, the MyTrees and watched/unwatched filters can't be used at the same time. If you want to filter on a MyTree, make sure you have selected Watched and unwatched. If you want to filter on your watchlist, make sure the MyTree filter is set to Whether or not in.

For performance reasons, when you filter on a MyTree or your watchlist, the system will restrict the number of issues displayed at a time. This is automatic, and you will be informed of the limit. Expect it to take several seconds for the list to appear.

Also for performance reasons, you cannot filter out deferred issues. Instead, any issue that you previously deferred is noted as such (if you are signed in).

List order

The list is in alphabetical order: Person pages by last name, first name followed by Family pages by page title. It is possible that the order of family pages will be changed in the future.

Fixing issues

Some issues, such as date typos, can be resolved by checking sources cited on the page. Others require some research. If you find a source to support a correction, please add the source to the page.

If there is an obvious typo in a date, please don't assume that it is just the century or the decade that is wrong. It is common to accidentally reverse the century and the decade, or to repeat one digit when repeating another digit was intended. For example:

  • don't assume that 1984 should be 1884 - maybe it should be 1894
  • don't assume that 1883 should be 1783 - maybe it should be 1773

Talk page

There is a Talk page associated with the Data Quality Issues page, although the link is not in the normal place. Look for it after the date the data was last updated. The Talk page can be used to coordinate data correction efforts, and to discuss usability of the Data Quality Issues page.