Toolkit Overview    :    Define    :    Plan    :    Gather    :    Preserve    :    METADATA    :    Storytelling   :   Share     :    Recommendations


El Grito de Sunset Park Use Case

Step 2: IDENTIFYING ATTRIBUTES

Now that you have a tentative list of entities, the next step is to decide what attributes, or properties, you want to track about each of those entities. You can think of attributes as the “fields” in a database, or a “column” in a spreadsheet.  For example, for the entity Officer, you might have “First Name,” “Last Name,” and “Tax ID” as attributes.

attribute example icons

Some things to consider when deciding on attributes:
  • Do you want to make your attributes align or match up with someone else’s database, so you can share data?  They don’t have to be exactly the same, as long as they can map one to another.
  • How granular do you want to make the attributes? Granularity refers to the degree to which your data is broken up into the smallest possible pieces. For example, having separate fields for “Street Number,” “Street Name,” “City,” and “Zipcode” is more granular than having one single field for “Address.” Generally, the more granular the data, the easier the data are to sort and analyze. Granular data are also more flexible — it’s easier to combine granular data together than to break “un-granular” data apart. However, the more granular your data, the more time and care you need to put into creating and entering them.
  • What information do you actually have? You can make up all kinds of attributes, but it’s not going to be very useful if you don’t actually have the data to populate those attributes in the database.

For the most part, if you have a good set of entities, identifying attributes for your data model is fairly straightforward.

Some traps to avoid when identifying attributes for your entities:

When mapping out your attributes (like in the El Grito Example below), the most important thing is to make sure each attribute belongs to the entity it is actually describing. For example, say you want a timeline of officers’ job promotions with start- and end-dates that they served at each rank. Are those start- and end-dates attributes of the “Officer” entity? It might seem so, but in fact those dates are descriptors of the officer’s rank history and not the officer per se.   

Let’s look at what would happen if you assigned Rank Start- and End-dates to the “Officer” entity:

OFFICERS (BAD EXAMPLE)
Officer ID Officer First Name Officer Last Name Rank Level Rank Start Date Rank End Date
1 Sam Green Officer 2010-01-01 2015-01-01
2 Jo Brown Officer 2009-09-30 2017-08-28
3 Jean Blue Sergeant 2015-04-09 2017-08-28
4 Sam Green Detective 2015-01-02 2017-07-15

Since an officer might have multiple ranks during their career, you would end up with duplicate entries for individual officers in your database. You want to avoid a situation like this.

Ideally, you should instead have two related tables in your database, for “Officer,” and “Officer Rank History,” like this:

OFFICERS
Officer ID Officer First Name Officer Last Name
1 Sam Green
2 Jo Brown
3 Jean Blue
OFFICER RANK HISTORY
Officer Rank ID (Related Officer ID) Rank Level Rank Start Date Rank End Date
1 1 Officer 2010-01-01 2015-01-01
2 2 Officer 2009-09-30 2017-08-28
3 3 Sergeant 2015-04-09 2017-08-28
4 1 Detective 2015-01-02 2017-07-15

By having separate tables for “Officer” and “Officer Rank History,” like above, you can record information about an officer’s changes in rank over time in your database without creating duplicate Officer records.

Another important data modeling principle (for relational databases, less so for NoSQL ones) is, in a given record, there should only be one value (i.e. entry) per attribute.  Keyword- or tag-type attributes often involve multiple values. For example, a policing incident might often involve more than one type of misconduct or abuse that you will want to tag. Here are two common but not-so-good ways to handle multiple values for an attribute:

INCIDENT MISCONDUCT (BAD EXAMPLE 1: LISTING DIVISIBLE VALUES)
Incident ID Incident Misconduct Type
1 Puerto Rican Day Parade Harassment, Pepper Spray, Punch/Kick
2 Arrest Outside Sunset Park Restaurant False Arrest, Excessive Force, Battery
3 Pregnant Woman Assaulted Excessive Force, Battery, False Arrest
INCIDENT MISCONDUCT (BAD EXAMPLE 2: REPEATING ATTRIBUTES)
Incident ID Incident Misconduct 1 Misconduct 2 Misconduct 3
1 Puerto Rican Day Parade Harassment Pepper Spray Punch/Kick
2 Arrest Outside Restaurant False Arrest Excessive Force Battery
3 Pregnant Woman Assaulted Excessive Force Battery False Arrest

In both examples above, there are multiple Misconduct Types for each record. In the first example, the entries work as “free text” but can’t be sorted by each misconduct type effectively. In the second example, the attribute is repeated so that each misconduct is in its own column. This also creates a sorting problem — notice that Incidents 2 and 3 actually involve the same three types of misconduct, but were entered in a different order, so the columns don’t match up.  

The ideal way to handle situations where there can be multiple values for an attribute is to make the attribute its own entity, and make an additional associative entity to relate them. So for the example above, a more ideal structure for a relational database would be:

INCIDENTS
Incident ID Incident
1 Puerto Rican Day Parade
2 Arrest Outside Sunset Park Restaurant
3 Pregnant Woman Assaulted
MISCONDUCT TYPES
Misconduct Type ID Misconduct Type
1 Harassment
2 Pepper Spray
3 Punch/Kick
4 False Arrest
5 Excessive Force
6 Battery
MISCONDUCT TAGS
Tag ID Incident ID Misconduct Type ID
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 2 6
7 3 4
8 3 5
9 3 6

The “Misconduct Tags” table above only has one value per attribute, so it enables you to sort by either Incident or by Misconduct.

El Grito Example

To determine the attributes in the El Grito model, we looked at the types of information that the team was including in the spreadsheet and the officer dossier.  We also looked at what attributes other groups were using in their metadata models.

Here is a preliminary list of attributes for each entity. Any attributes that are also entities are noted in parentheses (more on this in the next section on Relationships):

Entities Preliminary List of Attributes
Officers First Name
Last Name
Tax ID
Incident Count
Officer Shield History (Officer)
Shield Number
Is Current?
Shield Start Date
Shield End Date
Officer Rank History (Officer)
Rank
Is Current?
Rank Start Date
Rank End Date
Officer Salary History (Officer)
Salary
Is Current?
Salary Start Date
Salary End Date
Officers at Police Precincts History (Officer)
(Precinct)
Is Current?
Start Date
End Date
Police Precincts Precinct Number
Department Name
Officers at Incidents (Officer)
(Incident)
Officer Actions
Physical Appearance
Squad Car Number
Squad Car Plates
(Complaint or Lawsuit)
Complaints or Lawsuits Related to Incident (Incident)
Court or Board
Case Number
Filing Date
Defendant
Plaintiff
Cause of Action
Type
NYSCEF Link
Justia Link
Incidents Incident Name
Related Event
Location
Date
Time of Day
Type of Incident
Description
Alleged Misconduct
Images (Incident)
Thumbnail
URL
Caption
News Articles (Incident)
News Source
Headline
Summary
URL
Incidents on Video (Incident)
(Video)
Videos Title Online
URL
Filename
Bag ID
Filepath of Archived Copy
File Modification Date
Duration
Filesize
Version
Source
Description
Logger
Logging Notes
Incrimination Flag
Rightsholder
Terms of Use
Video Parts (Video)
Timecode In
Timecode Out
Description
Video Tags (Video)
Video Keyword
People First Name
Last Name
Contact Info
Relevant Info
People at Incidents (Person)
(Incident)
Role in Incident
Information Person Provided
Arrests or Cases Against Person
Outcome of Cases Against Person