Metadata: Identifying Entities

El Grito de Sunset Park Use Case

Step 1: IDENTIFYING ENTITIES

1.1. Brainstorming Possible Entities

The first step in designing a data model is making a list of entities, or what the database will describe. It can be useful to start with a big list and refine from there.

example entity icons

Entities from Questions

As previously discussed, you can start by looking at the goals of the project and what the database users want to know about. What types of persons, organizations, objects, or events are the questions about?

Berkeley Copwatch Example

Using the questions we developed with Berkeley Copwatch, we started creating a list of potential entities for their new data model. Looking at the questions and underlining the nouns (i.e. the persons, organizations, objects, or events) is a good place to start. For example:

Which videos show which incidents involving which officers?
Which officers most often use force?
What kind of force are officers using?
What are the histories of incidents for each officer?
Who else was present at an incident? What did they do?
Are officers deliberately interfering with the right to watch? Which officers?

This gives us a rough idea of what or who the questions are about. From there, we could start formulating a list of possible entities:

Question	Possible Entities
Which videos show which incidents involving which officers?	Videos, Incidents, Officers
Which officers most often use force?	Officers, Incidents (implied)
What kind of force are officers using?	Officers, Uses of Force
What are the histories of incidents for each officer?	Officers
Who else was present at an incident? What did they do?	Participants, Incidents
Are officers deliberately interfering with the right to watch? Which officers?	Officers, Incidents

Entities Emerging from Reviewing Content

Another way to identify possible entities for your data model is to start digging through your materials. Just start gathering data to see what information is available, and what information is important for your project, then sort the data into the “things” they seem to be describing. For example, the logging notes for a video of an arrest might include an officer’s badge number, the location of the incident, and the name of the person arrested, etc. You might also notice that, in your collection, there are multiple videos of the same arrest, and that you will need to distinguish which parts of the incident are captured in which video. From this, some possible entities for your data model can already be identified:

Piece of Information noted	Entity Being Described
Officer’s badge number	Officer
Location of the incident	Incident
Name of the person arrested	Civilian
Parts of incident captured on video	Video

El Grito Example

For the El Grito project, we reviewed all the videos and made notes in a spreadsheet, creating columns as we went along, and as we uncovered new types of information that were important to capture.

Here is a rough list of “things,” or entities, that seemed to be described in the metadata we were collecting:

Preliminary List of Possible Entities

Officers

Police Departments

Police Precincts

Videos

Incidents

Police Cars at Incident

Victims

Witnesses

Filmers

People who have Additional Information about Incident

Arrests Against People at Incident

News Articles

Public Events (e.g. Puerto Rican Pride Day Parade)

Actions / Misconduct / Allegations

Reports or Lawsuits Against Officers

Cataloger

1.2. Refining the Entity List

Once you have a big list of possible entities, the next step is to refine the list. Some entities might not need to be included if there isn’t data that you want to track about them, or some entities might actually work better as attributes of another entity. Some entities on the list might be very similar things, and it would make sense to collapse them into a single entity. There might be new entities you need to add to the list.

To refine your entity list, ask yourself:

Does this entity need to be included? Are there other entities that we need to add? Is there going to be data about this thing that we want to track that serves the purpose of this project?
Is this entity similar to other entities on the list? Can they be consolidated?
Do I want to align my entities with an external standard or someone else’s database? (NB: They don’t have to be exactly the same, as long as the structures can be mapped to one another in some way.)

There are no hard and fast rules about what should be an entity. As a rule-of-thumb, the simpler the better: focus on the goals of the database, and don’t create entities you don’t need, or else you will be creating a lot of extra work for data entry down the line. As you review and test, you might need to make revisions to make your model work best for your needs.

El Grito Example

On the big preliminary list above, we have “Police Department” and “Police Precinct” as possible entities. But in our project, the police department is always going to be “NYPD” and there isn’t data about the department per se that we want to track. Meanwhile, we do want to keep track of data about police precincts, like which officers belonged to which precinct and when. So it would make sense to keep “Police Precinct” as an entity, and just make “Police Department” an attribute of “Police Precinct.”

While researching officers, we learned that officers do not have permanent shield numbers, and that they can change over time. So we needed to create an additional entity for Shield History in order to track the start and end dates of shield numbers for each officer. We also learned that officers were often receiving promotions and salary increases over time, despite their involvement in numerous incidents and alleged misconduct. This is information that is important to track, so we also needed to create entities for Rank History and Salary History in order to track when those promotions took place. (Shield History, Rank History, and Salary History are considered weak entities because they’re dependent on the Officers entity to have meaning. If it’s confusing as to why they need to be separate entities instead of just attributes of the Officers entity, the Identifying Attributes section goes into more detail.)

Examining our list further, “Victims,” “Witnesses,” “Filmers,” and “People who have Additional Information about Incident” could really all be collectively called “People” instead of having separate entities for each of type of person. Whether they are victims, witnesses, or filmers could be designated using “Role” as an attribute for a person in relation to a specific incident. This especially makes sense because a single person might be a victim in one incident, a witness in another, and a filmer in yet another, and we want all of these instances related to the same Person, not as a separate Victim, Witness, or Filmer.

Advanced Note: Associative Entities

The “Role” attribute mentioned above introduces a need for another type of entity called an “associative entity.” An associative entity is an entity that exists at the intersection of two other entities, when there is data about that intersection that needs to be tracked. For example, in this case, a person’s role in an incident is neither the sole property of the person (like their name or address would be), or of the incident (like the date and location of the incident would be); it’s an “in-between” property of what you might call “Person at Incident.”

Similarly, we want to track the dates that an officer served in a particular precinct. But their “Start Date” and “End Date” is not really a property of either the “Officer” or of the “Police Precinct,” but rather of “Officer at Precinct.” So there seems to be a need for a number of associative entities in our data model!

Associative entities are also used when a “many-to-many” relationship exists — this is explained further in the Identifying Relationships section.

Based on all these considerations, here is a revised list of entities (which may be subject to still more revision!):

Revised List of Entities

Officers

Officer Shields

Officer Ranks

Officer Salaries

Police Precincts

Officers at Police Precincts

Officers at Incidents

Complaints or Lawsuits

Incidents

News Articles

Images

Incidents on Video

Videos

Video Parts (e.g. timecoded log)

People

People at Incidents

TERMINOLOGY

Weak Entities are entities that depend on another (strong) entity for their reason to exist. For example, “Rank” can be an entity if you want to track data like the start and end date of an officer’s rank. But “Rank” only has meaning if it’s associated with an Officer (whereas Officer, as an entity, can stand on its own), so “Rank” is a weak entity.

Associative Entities are entities that represent a situational intersection between two other entities, for which there is data to be tracked. For example, if Officers and Incidents are both entities, there could be an associative entity called “Officers at Incidents” where attributes about the officer’s presence at the incident would be tracked, like “Officer’s Actions” or their “Squad Car Number.”