Toolkit Overview : Define : Plan : Gather : Preserve : METADATA : Storytelling : Share : Recommendations
El Grito de Sunset Park Use Case
Step 2: IDENTIFYING ATTRIBUTES
Now that you have a tentative list of entities, the next step is to decide what attributes, or properties, you want to track about each of those entities. You can think of attributes as the “fields” in a database, or a “column” in a spreadsheet. For example, for the entity Officer, you might have “First Name,” “Last Name,” and “Tax ID” as attributes.
Some things to consider when deciding on attributes:
- Do you want to make your attributes align or match up with someone else’s database, so you can share data? They don’t have to be exactly the same, as long as they can map one to another.
- How granular do you want to make the attributes? Granularity refers to the degree to which your data is broken up into the smallest possible pieces. For example, having separate fields for “Street Number,” “Street Name,” “City,” and “Zipcode” is more granular than having one single field for “Address.” Generally, the more granular the data, the easier the data are to sort and analyze. Granular data are also more flexible — it’s easier to combine granular data together than to break “un-granular” data apart. However, the more granular your data, the more time and care you need to put into creating and entering them.
- What information do you actually have? You can make up all kinds of attributes, but it’s not going to be very useful if you don’t actually have the data to populate those attributes in the database.
For the most part, if you have a good set of entities, identifying attributes for your data model is fairly straightforward.
Some traps to avoid when identifying attributes for your entities:
When mapping out your attributes (like in the El Grito Example below), the most important thing is to make sure each attribute belongs to the entity it is actually describing. For example, say you want a timeline of officers’ job promotions with start- and end-dates that they served at each rank. Are those start- and end-dates attributes of the “Officer” entity? It might seem so, but in fact those dates are descriptors of the officer’s rank history and not the officer per se.
Let’s look at what would happen if you assigned Rank Start- and End-dates to the “Officer” entity:
OFFICERS (BAD EXAMPLE)
Officer ID | Officer First Name | Officer Last Name | Rank Level | Rank Start Date | Rank End Date |
1 | Sam | Green | Officer | 2010-01-01 | 2015-01-01 |
2 | Jo | Brown | Officer | 2009-09-30 | 2017-08-28 |
3 | Jean | Blue | Sergeant | 2015-04-09 | 2017-08-28 |
4 | Sam | Green | Detective | 2015-01-02 | 2017-07-15 |
Since an officer might have multiple ranks during their career, you would end up with duplicate entries for individual officers in your database. You want to avoid a situation like this.
Ideally, you should instead have two related tables in your database, for “Officer,” and “Officer Rank History,” like this:
OFFICERS
Officer ID | Officer First Name | Officer Last Name |
1 | Sam | Green |
2 | Jo | Brown |
3 | Jean | Blue |
OFFICER RANK HISTORY
Officer Rank ID | (Related Officer ID) | Rank Level | Rank Start Date | Rank End Date |
1 | 1 | Officer | 2010-01-01 | 2015-01-01 |
2 | 2 | Officer | 2009-09-30 | 2017-08-28 |
3 | 3 | Sergeant | 2015-04-09 | 2017-08-28 |
4 | 1 | Detective | 2015-01-02 | 2017-07-15 |
By having separate tables for “Officer” and “Officer Rank History,” like above, you can record information about an officer’s changes in rank over time in your database without creating duplicate Officer records.
Another important data modeling principle (for relational databases, less so for NoSQL ones) is, in a given record, there should only be one value (i.e. entry) per attribute. Keyword- or tag-type attributes often involve multiple values. For example, a policing incident might often involve more than one type of misconduct or abuse that you will want to tag. Here are two common but not-so-good ways to handle multiple values for an attribute:
INCIDENT MISCONDUCT (BAD EXAMPLE 1: LISTING DIVISIBLE VALUES)
Incident ID | Incident | Misconduct Type |
1 | Puerto Rican Day Parade | Harassment, Pepper Spray, Punch/Kick |
2 | Arrest Outside Sunset Park Restaurant | False Arrest, Excessive Force, Battery |
3 | Pregnant Woman Assaulted | Excessive Force, Battery, False Arrest |
INCIDENT MISCONDUCT (BAD EXAMPLE 2: REPEATING ATTRIBUTES)
Incident ID | Incident | Misconduct 1 | Misconduct 2 | Misconduct 3 |
1 | Puerto Rican Day Parade | Harassment | Pepper Spray | Punch/Kick |
2 | Arrest Outside Restaurant | False Arrest | Excessive Force | Battery |
3 | Pregnant Woman Assaulted | Excessive Force | Battery | False Arrest |
In both examples above, there are multiple Misconduct Types for each record. In the first example, the entries work as “free text” but can’t be sorted by each misconduct type effectively. In the second example, the attribute is repeated so that each misconduct is in its own column. This also creates a sorting problem — notice that Incidents 2 and 3 actually involve the same three types of misconduct, but were entered in a different order, so the columns don’t match up.
The ideal way to handle situations where there can be multiple values for an attribute is to make the attribute its own entity, and make an additional associative entity to relate them. So for the example above, a more ideal structure for a relational database would be:
INCIDENTS
Incident ID | Incident |
1 | Puerto Rican Day Parade |
2 | Arrest Outside Sunset Park Restaurant |
3 | Pregnant Woman Assaulted |
MISCONDUCT TYPES
Misconduct Type ID | Misconduct Type |
1 | Harassment |
2 | Pepper Spray |
3 | Punch/Kick |
4 | False Arrest |
5 | Excessive Force |
6 | Battery |
MISCONDUCT TAGS
Tag ID | Incident ID | Misconduct Type ID |
1 | 1 | 1 |
2 | 1 | 2 |
3 | 1 | 3 |
4 | 2 | 4 |
5 | 2 | 5 |
6 | 2 | 6 |
7 | 3 | 4 |
8 | 3 | 5 |
9 | 3 | 6 |
The “Misconduct Tags” table above only has one value per attribute, so it enables you to sort by either Incident or by Misconduct.
El Grito Example
To determine the attributes in the El Grito model, we looked at the types of information that the team was including in the spreadsheet and the officer dossier. We also looked at what attributes other groups were using in their metadata models.
Here is a preliminary list of attributes for each entity. Any attributes that are also entities are noted in parentheses (more on this in the next section on Relationships):
Entities | Preliminary List of Attributes |
Officers | First Name |
Last Name | |
Tax ID | |
Incident Count | |
Officer Shield History | (Officer) |
Shield Number | |
Is Current? | |
Shield Start Date | |
Shield End Date | |
Officer Rank History | (Officer) |
Rank | |
Is Current? | |
Rank Start Date | |
Rank End Date | |
Officer Salary History | (Officer) |
Salary | |
Is Current? | |
Salary Start Date | |
Salary End Date | |
Officers at Police Precincts History | (Officer) |
(Precinct) | |
Is Current? | |
Start Date | |
End Date | |
Police Precincts | Precinct Number |
Department Name | |
Officers at Incidents | (Officer) |
(Incident) | |
Officer Actions | |
Physical Appearance | |
Squad Car Number | |
Squad Car Plates | |
(Complaint or Lawsuit) | |
Complaints or Lawsuits Related to Incident | (Incident) |
Court or Board | |
Case Number | |
Filing Date | |
Defendant | |
Plaintiff | |
Cause of Action | |
Type | |
NYSCEF Link | |
Justia Link | |
Incidents | Incident Name |
Related Event | |
Location | |
Date | |
Time of Day | |
Type of Incident | |
Description | |
Alleged Misconduct | |
Images | (Incident) |
Thumbnail | |
URL | |
Caption | |
News Articles | (Incident) |
News Source | |
Headline | |
Summary | |
URL | |
Incidents on Video | (Incident) |
(Video) | |
Videos | Title Online |
URL | |
Filename | |
Bag ID | |
Filepath of Archived Copy | |
File Modification Date | |
Duration | |
Filesize | |
Version | |
Source | |
Description | |
Logger | |
Logging Notes | |
Incrimination Flag | |
Rightsholder | |
Terms of Use | |
Video Parts | (Video) |
Timecode In | |
Timecode Out | |
Description | |
Video Tags | (Video) |
Video Keyword | |
People | First Name |
Last Name | |
Contact Info | |
Relevant Info | |
People at Incidents | (Person) |
(Incident) | |
Role in Incident | |
Information Person Provided | |
Arrests or Cases Against Person | |
Outcome of Cases Against Person |