Project Initiation  «Prev  Next»
Lesson 5 Data dictionary
Objective Use and benefits of keeping data dictionary/Problem domain vocabulary.

Data Dictionary Benefits in the Project Life Cycle

Here's a breakdown of the use and benefits of a data dictionary for developing your problem domain vocabulary in the project initiation phase:
  1. Use of a Data Dictionary:
    • Establishing a Clear Understanding: A data dictionary acts as a centralized repository of definitions, descriptions, and relationships between all the key terms and data elements relevant to your project and its specific problem area (the problem domain). Think of it like a glossary tailored to your project's world.
    • Capturing Context: The data dictionary goes beyond simple definitions. It provides examples, explains the purpose of each data element, and documents how it should be formatted or used within your system.
    • Promoting Consistency: Each term in your data dictionary becomes the "official" reference point. This encourages everyone on the team to use the same language, reducing misunderstandings that arise when people use similar terms with slightly different meanings.
  2. Benefits of a Data Dictionary:
    • Improved Communication: A shared problem domain vocabulary, guided by the data dictionary, bridges gaps between technical teams and business stakeholders. Everyone speaks the same language.
    • Reduced Ambiguity and Errors: By clearly defining data elements, you minimize the risk of misinterpretation. This precision translates into more accurate data analysis and decision-making.
    • Faster Onboarding: A data dictionary acts as a reference guide for new team members. They can rapidly gain an understanding of the project's terminology and data structures.
    • Enhanced Documentation: Your data dictionary becomes a valuable component of your project documentation. It provides clarity for future reference or when the project gets handed off to new teams.
    • Smoother Data Integration: If your project involves combining data from different sources, the data dictionary helps identify and reconcile inconsistencies in how that data is represented.

Example: Imagine you're starting a project to analyze customer churn (when customers stop doing business with you) for an online retailer. Here's how a data dictionary might help:
Term Definition Example
Customer ID A unique numerical identifier for each customer 123456
Churn Date The date a customer's last purchase was recorded 03/15/2024
Subscription Type The type of subscription plan a customer is enrolled in (e.g., monthly, annual) Monthly

Advantages of using a Data Dictionary

I did an experiment in my instructor-led classes. For five classes, I did not have the students work on a data dictionary before they began object modeling. For five more classes, I required them to draft a data dictionary before they began object modeling. They all did the exact same workshop. The first five classes, without the data dictionary, took an average of 6 to 8 hours to complete the workshop object model. The second five classes, which already had a data dictionary in hand, took only 3 to 4 hours. A data dictionary is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format." The term may have one of several closely related meanings pertaining to databases and database management systems (DBMS):
  1. a document describing a database or collection of databases
  2. an integral component of a DBMS that is required to determine its structure
  3. a piece of middleware that extends or supplants the native data dictionary of a DBMS.

Humans are very good at context-specific interpretation[1] of information; computers are not. Humans can reinterpret and correct misunderstandings and miscommunications, whereas computers typically cannot. This is a critical distinction when you realize that terminology is the basis for modeling.

Example to define Project Terminology

Here is a simple example of how critical it is to define project terminology. In the course project, where you are developing an online ticket sales system for a local auditorium, the word "hold" can have a variety of possible definitions:
  1. Change the status of a seat while a customer decides whether or not to buy it
  2. Lock the database record
  3. File a ticket at the door for pick-up
  4. Suspend a transaction
  5. Keep the ticket and do not mail it because a payment bounced
How can you resolve these conflicting definitions?

Data dictionary

To resolve conflicts over the interpretation of words in everyday situations, you can pull a dictionary from the shelf and get the definition of the term in question. In a project, you can create your own dictionary. A data dictionary defines the vocabulary of the problem domain. Entries in the data dictionary can and will change over the course of a project, but they should change only when everyone involved in the project agrees to the change. Use a data dictionary to ensure consistent, high-quality communication.

Contents of your data dictionary

What belongs in a data dictionary? The examples below will give you a good starting point.
For nouns, you should include these items:

Name : Fully qualify the name with adjectives if needed to distinguish the term as applied in different contexts
Definition: Allow multiple definitions only when each context is clearly defined
Synonyms: Other terms that may be encountered during the project and that mean the same thing
Data type: What kind of data is used to describe this element?
Length: How many characters or digits are allowed?
Legacy equivalents: What field names hold this information in the old systems or in related systems
Example of its use: Provide an example of how the users might refer to this item in conversation
Derived? Is this element derived from other information? If yes, then specify the derivation rule.

For verbs, you should include these items:
Verb Provide the most common form, for example, "hold"
Alternative forms: Provide other ways the verb is used, for example, "held", "on hold", "holding"
Example of its use Provide examples of how the users would use the term in the course of doing business

An example noun: ticket

Noun: ticket

Name Ticket
Defintion A receipt documenting the purchases of a unique seat in a single show
Synonyms Seat (the users tend to use these terms interchangeably depending on the context), for example, that customer just purchased 4 seats
Data Type Contains details about the seat and show details, a serial number,a dn some advertising on the back
Length See data type
Legacy Equivalent The old database did not track individual tickets. This is a new feature so that we can issue tickets electronically rather than pyscially.
Example of its use The system will issue a ticket when the purchase is verified. A ticket represents the purcahse of seat in one show.
Derived Only the serial number is unique to the ticket. The serial number is assigned by incrementing the last serial number

An example verb: hold
Verb "hold"
Alternative forms: "held", "on hold", "holding"
Example of its use: When a customer identifies a seat in a show that they want to purchase, that seat is held for them until they release it or the system releases it due to timeout.
A customer may hold a seat while they find the rest of the seats they want to purchase in that particular show.
Holding a seat guarantees that other people cannot take it from you while you make up your mind.

Data Dictionary - Exercise

Click the Exercise link below to identify key data dictionary nouns in the course project problem statement.
Data Dictionary - Exercise

[1] context-specific interpretation: Context-specific interpretation means that the understanding of a word, phrase, or idea depends entirely on the situation in which it's used. The surrounding details, like the topic of conversation, who is speaking, or the broader setting, are essential for determining the correct meaning.

SEMrush Software