The Sequel:
Key Problems Entity Resolution Regulatory Hurdles
Entity Resolution LEA: Jorge Castillo-Estrada 9/30/1997 M L LEA: George Castillo 9/30/1997 M L
Name Counts Student CountFirst NameLast Name 64JOSHUASMITH 56ASHLEYSMITH 52JESSICASMITH 48JUSTINSMITH 37ASHLEYJONES 31JUSTINWILLIAMS 30JESSICAJOHNSON 27JOSHUABROWN There are ~55,000 unique first names among students in Arkansas and ~40,000 last names. Approximately 20% of Arkansas students share both the same first and last name with another student.
More Data Issues There are 4,026 students in Arkansas that share an SSN with at least one other student in the state. Between August and January, 874 student transfers to other schools resulted in an SSN change. Between August and January, an additional 1,018 students changed their SSN—we have records for only 300 of these changes. Between August and January, 21,255 students moved to another district in the state—only 18,986 students were marked as “withdrawn.”
The Knowledge Base Approach “Indicative” information from multiple data sources is stored and merged into an “equivalence class” for each entity, using both fuzzy and logical associations. Knowledge base identifiers are used to manage the references. Bob Smith, Barton Elementary Robert Smith, Barton Elementary Bob Smith, Wilson Elementary Fuzzy Match Logical Match (Drop/Enroll) IdentifierRepresentation KB5765Bob Smith, Barton KB5765Robert Smith, Barton KB5765Bob Smith, Wilson Knowledge Base
Two Agencies, Two Regulations HIPPA FERPA
A trusted broker maintains a cross reference table, encoding the identifiers for various agencies and for various representations of the entities. Trusted Broker Bob SmithAC0236 Robert SmithED4297 ACHIADE Trusted Broker IdentifierRepresentation Identifier Encoded for ACHI Identifier Encoded for ADE KB5765Bob Smith, BartonAC0236ED4297 KB5765Robert Smith, BartonAC0236ED4297 KB5765Bob Smith, WilsonAC0236ED4297
Encoded Links The trusted broker can provide multiple agencies with encoded versions of the (hidden) knowledge base identifiers, protecting all future data requests. Bob SmithAC0236 Robert SmithAC0236 Bob SmithAC0236 Katherine JohnsAC0651 Kate SandersAC0651 Erica DavisAC1327 ED4297Bob Smith ED4297Robert Smith ED4297Bob Smith ED8516Katherine Johns ED8516Kate Sanders ED3508Erica Davis-Hill ACHIADE Trusted Broker
Brokered Result 1 AC0236 Score: 242 AC0651 Score: 417 AC1327 Score: 385 Data Requests The trusted broker translates encoded links between agencies for data requests and no personally identifying information needs to be exchanged. ACHI ADE What are the test scores for the following students? AC0236 AC0651 AC1327 ED3508 Score: 385 ED4297 Score: 242 ED8516 Score: 417 Trusted Broker AC0236 ↔ ED4297 AC0651 ↔ ED8516 AC1327 ↔ ED3508 Brokered Result 2 Score: 242 Score: 385 Score: 417 Brokered Result 3 Average Score: 348
Brokered Result 1 AC0236 Score: 242 AC0651 Score: 417 AC1327 Score: 385 Result Options The trusted broker may deliver results between agencies in a variety of ways, without exchanging personally identifying information. Trusted Broker Brokered Result 2 Score: 242 Score: 385 Score: 417 Brokered Result 3 Average Score: 348 Individual level results with encoded links (safe, encoded) Individual level results without links, random (safe, anonymous) Aggregated results (safe, anonymous)