Metadata: Identifying Attributes

El Grito de Sunset Park Use Case

Step 2: IDENTIFYING ATTRIBUTES

Now that you have a tentative list of entities, the next step is to decide what attributes, or properties, you want to track about each of those entities. You can think of attributes as the “fields” in a database, or a “column” in a spreadsheet. For example, for the entity Officer, you might have “First Name,” “Last Name,” and “Tax ID” as attributes.

attribute example icons

Some things to consider when deciding on attributes:

Do you want to make your attributes align or match up with someone else’s database, so you can share data? They don’t have to be exactly the same, as long as they can map one to another.
How granular do you want to make the attributes? Granularity refers to the degree to which your data is broken up into the smallest possible pieces. For example, having separate fields for “Street Number,” “Street Name,” “City,” and “Zipcode” is more granular than having one single field for “Address.” Generally, the more granular the data, the easier the data are to sort and analyze. Granular data are also more flexible — it’s easier to combine granular data together than to break “un-granular” data apart. However, the more granular your data, the more time and care you need to put into creating and entering them.

What information do you actually have? You can make up all kinds of attributes, but it’s not going to be very useful if you don’t actually have the data to populate those attributes in the database.

For the most part, if you have a good set of entities, identifying attributes for your data model is fairly straightforward.

Some traps to avoid when identifying attributes for your entities:

When mapping out your attributes (like in the El Grito Example below), the most important thing is to make sure each attribute belongs to the entity it is actually describing. For example, say you want a timeline of officers’ job promotions with start- and end-dates that they served at each rank. Are those start- and end-dates attributes of the “Officer” entity? It might seem so, but in fact those dates are descriptors of the officer’s rank history and not the officer per se.

Let’s look at what would happen if you assigned Rank Start- and End-dates to the “Officer” entity:

OFFICERS (BAD EXAMPLE)

Officer ID	Officer First Name	Officer Last Name	Rank Level	Rank Start Date	Rank End Date
1	Sam	Green	Officer	2010-01-01	2015-01-01
2	Jo	Brown	Officer	2009-09-30	2017-08-28
3	Jean	Blue	Sergeant	2015-04-09	2017-08-28
4	Sam	Green	Detective	2015-01-02	2017-07-15

Since an officer might have multiple ranks during their career, you would end up with duplicate entries for individual officers in your database. You want to avoid a situation like this.

Ideally, you should instead have two related tables in your database, for “Officer,” and “Officer Rank History,” like this:

OFFICERS

Officer ID	Officer First Name	Officer Last Name
1	Sam	Green
2	Jo	Brown
3	Jean	Blue

OFFICER RANK HISTORY

Officer Rank ID	(Related Officer ID)	Rank Level	Rank Start Date	Rank End Date
1	1	Officer	2010-01-01	2015-01-01
2	2	Officer	2009-09-30	2017-08-28
3	3	Sergeant	2015-04-09	2017-08-28
4	1	Detective	2015-01-02	2017-07-15

By having separate tables for “Officer” and “Officer Rank History,” like above, you can record information about an officer’s changes in rank over time in your database without creating duplicate Officer records.

Another important data modeling principle (for relational databases, less so for NoSQL ones) is, in a given record, there should only be one value (i.e. entry) per attribute. Keyword- or tag-type attributes often involve multiple values. For example, a policing incident might often involve more than one type of misconduct or abuse that you will want to tag. Here are two common but not-so-good ways to handle multiple values for an attribute:

INCIDENT MISCONDUCT (BAD EXAMPLE 1: LISTING DIVISIBLE VALUES)

Incident ID	Incident	Misconduct Type
1	Puerto Rican Day Parade	Harassment, Pepper Spray, Punch/Kick
2	Arrest Outside Sunset Park Restaurant	False Arrest, Excessive Force, Battery
3	Pregnant Woman Assaulted	Excessive Force, Battery, False Arrest

INCIDENT MISCONDUCT (BAD EXAMPLE 2: REPEATING ATTRIBUTES)

Incident ID	Incident	Misconduct 1	Misconduct 2	Misconduct 3
1	Puerto Rican Day Parade	Harassment	Pepper Spray	Punch/Kick
2	Arrest Outside Restaurant	False Arrest	Excessive Force	Battery
3	Pregnant Woman Assaulted	Excessive Force	Battery	False Arrest

In both examples above, there are multiple Misconduct Types for each record. In the first example, the entries work as “free text” but can’t be sorted by each misconduct type effectively. In the second example, the attribute is repeated so that each misconduct is in its own column. This also creates a sorting problem — notice that Incidents 2 and 3 actually involve the same three types of misconduct, but were entered in a different order, so the columns don’t match up.

The ideal way to handle situations where there can be multiple values for an attribute is to make the attribute its own entity, and make an additional associative entity to relate them. So for the example above, a more ideal structure for a relational database would be:

INCIDENTS

Incident ID	Incident
1	Puerto Rican Day Parade
2	Arrest Outside Sunset Park Restaurant
3	Pregnant Woman Assaulted

MISCONDUCT TYPES

Misconduct Type ID	Misconduct Type
1	Harassment
2	Pepper Spray
3	Punch/Kick
4	False Arrest
5	Excessive Force
6	Battery

MISCONDUCT TAGS

Tag ID	Incident ID	Misconduct Type ID
1	1	1
2	1	2
3	1	3
4	2	4
5	2	5
6	2	6
7	3	4
8	3	5
9	3	6

The “Misconduct Tags” table above only has one value per attribute, so it enables you to sort by either Incident or by Misconduct.

El Grito Example

To determine the attributes in the El Grito model, we looked at the types of information that the team was including in the spreadsheet and the officer dossier. We also looked at what attributes other groups were using in their metadata models.

Here is a preliminary list of attributes for each entity. Any attributes that are also entities are noted in parentheses (more on this in the next section on Relationships):

Entities	Preliminary List of Attributes
Officers	First Name
	Last Name
	Tax ID
	Incident Count
Officer Shield History	(Officer)
	Shield Number
	Is Current?
	Shield Start Date
	Shield End Date
Officer Rank History	(Officer)
	Rank
	Is Current?
	Rank Start Date
	Rank End Date
Officer Salary History	(Officer)
	Salary
	Is Current?
	Salary Start Date
	Salary End Date
Officers at Police Precincts History	(Officer)
	(Precinct)
	Is Current?
	Start Date
	End Date
Police Precincts	Precinct Number
	Department Name
Officers at Incidents	(Officer)
	(Incident)
	Officer Actions
	Physical Appearance
	Squad Car Number
	Squad Car Plates
	(Complaint or Lawsuit)
Complaints or Lawsuits Related to Incident	(Incident)
	Court or Board
	Case Number
	Filing Date
	Defendant
	Plaintiff
	Cause of Action
	Type
	NYSCEF Link
	Justia Link
Incidents	Incident Name
	Related Event
	Location
	Date
	Time of Day
	Type of Incident
	Description
	Alleged Misconduct
Images	(Incident)
	Thumbnail
	URL
	Caption
News Articles	(Incident)
	News Source
	Headline
	Summary
	URL
Incidents on Video	(Incident)
	(Video)
Videos	Title Online
	URL
	Filename
	Bag ID
	Filepath of Archived Copy
	File Modification Date
	Duration
	Filesize
	Version
	Source
	Description
	Logger
	Logging Notes
	Incrimination Flag
	Rightsholder
	Terms of Use
Video Parts	(Video)
	Timecode In
	Timecode Out
	Description
Video Tags	(Video)
	Video Keyword
People	First Name
	Last Name
	Contact Info
	Relevant Info
People at Incidents	(Person)
	(Incident)
	Role in Incident
	Information Person Provided
	Arrests or Cases Against Person
	Outcome of Cases Against Person