How To Do Database Notation For Data Mapping

Implementing a data map in practice is a meticulous but worthwhile process that sets the foundation for effective personal data management.

Intro: Who Should Read This Tutorial?

If you’re not directly responsible for the technical implementation of your company’s data map, the following article will help you understand some key data map-related terms and processes. We start with some important definitions. These can be used to communicate more clearly with your technical resource on how best to approach the task of building a data map for your business. The rest of this tutorial will be useful for understanding the work that goes into manually mapping the data your company holds.
[su_table]

TERM DESCRIPTION
Data Model An organized, visual representation of the data that are stored in a database and how that data relates to each other.
Abstract Data Model A simplified representation of a data model.
Data Object A region of storage that contains a value or group of values. Each object has a defined data type (e.g., an integer, decimal number, character, or string of characters that make a word or sentence).
Data Attribute The characteristics of a data object or features of a data set. For an order dataset consisting of an ORDER ID, CUSTOMER NAME, and CUSTOMER EMAIL, CUSTOMER NAME and CUSTOMER EMAIL would be the data attributes for that order ID.
Data Entity Representation in a data model of a physical or conceptual object from the real world, such as a CUSTOMER. Entities don’t represent any data themselves but are containers for attributes and relationships between objects.
Data Element Any unit of data defined for processing is a data element (e.g., ACCOUNT NUMBER, NAME, ADDRESS and CITY).
Data Schema A physical implementation of a data model in a specific database management system. It includes all implementation details such as data types, constraints, and foreign or primary keys.
Data Dictionary A set of information describing the contents, format, and structure of a database and the relationship between its elements, used to control access to and manipulation of the database. It is a reference and description of each data element.
Domain Specific Dedicated to or focused on a particular problem domain.

[/su_table]

Above, you can see a visual representation of some key database terms as described in this article.

If you’re the individual responsible for actually implementing the data map, the walkthrough below should get you started on the right track. The key process we examine in this piece is called Data Notation. Put simply, it’s a process that blends computer and manual methods to annotate (and then verify) entity relationships for entities that store personal data in your business. It’s the vital technical process for building a CCPA or GDPR-compliant data map. 

What Is Data Mapping?

A data map (also known as a data flow map, personally identifiable information disclosure under CCPA, or Article 30 inventory assessment under GDPR) is a clear representation of a company’s data infrastructure. It provides a record of all of the personally identifiable data points that your company processes and contains information on that data, such as what type of data it is, why it is collected, and who has access to it. 

Data mapping is the process of creating a data map. It involves identification of the elements of a business’ data model that align with a domain-specific abstract data model. Data maps are created for the purpose of applying domain specific solutions to a business’ data. In short, it’s about figuring out what types of personal data your organization is processing for the purposes of more efficient and effective data management.

As a term closely related to data mapping, database mapping is the process of inventorying a database. Database mapping aims to document the types of personal information stored in a database and the purposes of data collection. A single business can have a variety of databases—some SQL and some NoSQL, for instance. Thus, successful database mapping requires a scalable process for inventorying a variety of database structures. Hereafter, we use the term “data mapping,” as it encompasses “database mapping.”

When visualized, a data map most often contains nodes and links to show how different systems that contain any personally identifiable information link together. An effective data map helps you stay compliant with data privacy law and allows for efficient data management across your organization. You can find out more about mapping the state and flow of personal data in our guide to building a company data map.

How to Get Started With Data Mapping

Data Mapping Tasks

There are a number of tasks involved in the mapping of a company’s data infrastructure for data privacy law compliance. These include:

  • Identifying and recording the individual personally identifiable information that your organization is processing into one central repository;
  • Annotating entity relationships for entities that store personal data (such as identifying a ‘user’ entity and an ‘order’ entity) and recording the relationship between the two to determine all applications within your organization that have access to a user’s personal data;
  • Mapping personal data lineage, i.e., making a record of all of the paths through which personal data is transferred while it is processed by your company.

The ideal approach combines the efficiency of automation with the reliability of a human expert

Methods of Data Mapping

Each of the above tasks can be performed using the following methods:

  • Manual Data Mapping
    Personnel with knowledge of the business, the logical data model, and the physical data structure of the data that your organization is processing manually record the data map. This can be quite tedious and time consuming depending on the size and age of your organization when you begin the data mapping process.
  • Automated Data Mapping
    Probabilistic algorithms and trained AI models go through the labels, data, and calculated stats of the data to automatically create a data map. A much quicker method than manual data mapping, but may still require a manual human review to confirm its validity.
  • Hybrid Data Mapping
    Automated procedures produce baseline mapping which can then be verified, adjusted, and approved by knowledgeable personnel within your organization. This is the ideal approach as it is both efficient and reliable.

Validating Your Data Mapping Decisions

Once you have decided on the appropriate method for implementing your data map and have compiled your first iteration, you will then need to confirm the decisions that you made while completing the mapping were indeed correct. 

Depending on the information used to create your company’s data map, the analysis process can be divided into two different stages:

  1. Label Analysis
    Makes use of the information provided by schemas of the data entities. This information includes names of databases, entities and attributes. 
  2. Data Analysis
    Following label analysis, the actual data stored in the system is used to confirm that the data map has been compiled correctly. 

Label Analysis 

This step involves using a data attribute’s name to decide which data dictionary element the attribute identifies. For example, an attribute named as ‘email_id’ can be identified as an ‘person.contact.email-address’.

When the attribute’s name alone is not enough, the name of the entity to which the attribute belongs is included in the analysis. For example, the attribute named as ‘name’ in an entity named ‘customer’ can be identified as a name of a person, whereas a similarly named attribute in an entity named as ‘city_master’ can be identified as a name of a city.

Sometimes a group of attributes occurring together can be identified easily. For example, attributes with names ‘lat’ and ‘lng’ can be identified as latitude and longitude of a location. Similarly, attributes with names ‘f_name’ and ‘l_name’ can be identified as parts of a person’s name.

#Example taxonomy - privacy data (yaml file)

person #person’s information
    - name
          - first_name
          - last_name
    - contact #person’s contact information
          - mailing-address #person’s mailing address details
                - street
                - city
                - state
                - zip
          - phone #person’s contact phone numbers
                - mobile
                - home
                - work
    - email
- identification #person’s identification details
    - drivers-license
         - number
         - state
    - passport
         - number
         - issuing-country
#Example entity and attributes (yaml file) 
entities:
    - name: users
      attributes
       - name: id
         datatype: number
       - name: f_name
         datatype: text
       - name: l_name
         datatype: text
       - name: email
         datatype: text
       - name: street
         datatype: text
       - name: city
         datatype: text
       - name: phone
         datatype: text
#Example data mapping (yaml file) 
entities:
    - name: users
      attributes
          - name: id
            datatype: number
          - name: f_name
            datatype: text
            dataclass: person.name.first_name
          - name: l_name
            datatype: text
            dataclass: person.name.last_name
          - name: email
            datatype: text
            dataclass: person.contact.email
          - name: street
            datatype: text
            dataclass: person.contact.address.street
          - name: city
            datatype: text
            dataclass: person.contact.address.city
          - name: phone
            datatype: text
            dataclass: person.contact.phone.home

Data Analysis 

The second stage involved in the confirmation of your data map involves using the actual data stored in your database to confirm the accuracy of the data map created during the label analysis stage.

Presence of certain values or patterns in the actual data can confirm certain mappings. For example, if the data for an attribute follows ‘foo@bar.com’ pattern, it can be confirmed that this attribute captures an email address.

Range values captured in attributes of different tables can be checked to see if those attributes are referring to the same piece of information, such as a user_id, that can be used for linking the tables with specific relationships. Similarly, absence of values in an attribute can confirm the opposite.

As soon as your data map has been compiled and validated, you should assign an individual or team who will be responsible for its upkeep as it is a constantly evolving and changing document by nature.  The challenge now lies in the effective maintenance of your data map, ensuring that it stays up to date, and in compliance with any data protection or privacy regulations that may be applicable to your organization.

If you’d like to make short work of creating your organization’s data map, you can take a look at how Ethyca’s automated data mapping software does so, seamlessly. 

Data Flow Map Example