SQL Application Persistence Design Patterns Lecture 17 SQL Application Persistence Design Patterns
SQLite A relational database management system Not a client–server database engine. It is embedded into the program itself! The database is stored in a file SQLite is widely used as embedded database software for local storage in application software such as web browsers. It is arguably the most widely deployed database engine, as it is used today by several widespread browsers! For Python we will see sqlite3 library
SQLite: connecting to database import sqlite3 # connect to an existing database # if the file does not exist, # it will be created! connection = sqlite3.connect('mydb.db') # object received is used to execute # both commands and queries on the database
SQLite: creating a table - command connection.execute(""" CREATE TABLE students ( id INT PRIMARY KEY, name TEXT NOT NULL ) """
SQLite: inserting data - command id = 0 name = ‘mark’ connection.execute(""" INSERT INTO students (id, name) VALUES (?,?) """, [id, name])
SQLite: retrieving data - query # executing a statement that return a value cursor = connection.cursor() id = 0 cursor.execute(""" SELECT * FROM students WHERE id = ? """, id) #the result is stored inside the curser we can retrieve it # as a list of tuples using: all = cursor.fetchall() #or if we know that there is a single value val = cursor.fetchone()
Data Persistence Layer Definition: The persistence layer deals with persisting data of a data store Persisting is storing, modifying, and retrieving data Data store can be any data storage format, e.g. database Purpose: Effectively creates a separation between the application logic and the database access and manipulation Comparison to networking: ConnectionHandler – executes reads/writes ServerProtocol – which uses retrieves data, and creates responses for sending Persistence Layer consists of three parts: Data Transfer Objects - DTO Data Access Objects – DAO Repository - repos
Persistence Layer Components Data Transfer Object – DTO An object that represents a record from a single table. Its variables represent the columns of the table. Data Access Object – DAO Contain methods for retrieving and storing DTOs. Each DAO is responsible for a single DTO. Repository Contains commands/queries that span over multiple tables. Effectively manages multiple related DTOs
Data Transfer Object DTOs are passed to and from the persistence layer. If transferred to application logic: They contain the data retrieved from the database. Contains query result. If transferred to persistence layer They contain the data that to be written to the database. Data will added using commands. Naming Convention: DTO name will be the singular of a plural table name. Example: Table named “grades”, DTO will be named “grade” Names of DTO constructor parameters == DTO fields names == table represented by the DTO column names
Without Persistence Layer: Students Grades Example Three entities: Students Assignments Grades Characteristics: Student has a unique id Assignments are numbered (e.g., assignment 1, 2,..) Each submitted assignment contains only a single *.py file with a method run_assignment that accept no arguments and returns a string. The submitted file name will be of the form <submitter student id>.py
Creating the tables def create_tables(): connection.executescript(""" CREATE TABLE students ( id INT PRIMARY KEY, name TEXT NOT NULL ); CREATE TABLE assignments ( num INT PRIMARY KEY, expected_output TEXT NOT NULL
Creating the tables: continued CREATE TABLE grades ( student_id INT NOT NULL, assignment_num INT NOT NULL, grade INT NOT NULL, FOREIGN KEY(student_id) REFERENCES students(id), FOREIGN KEY(assignment_num) REFERENCES assignments(num), PRIMARY KEY (student_id, assignment_num) ); """)
Adding some data def insert_student(id, name): _conn.execute(""" INSERT INTO students (id, name) VALUES (?, ?) """, [id, name]) def insert_assignment(num, expected_output): INSERT INTO assignments (num, expected_output) VALUES (?, ?) """, [num, expected_output]) def insert_grade(student_id, assignment_num, grade): INSERT INTO grades (student_id, assignment_num, grade) VALUES (?, ?, ?) """, [student_id, assignment_num, grade])
Adding Grades def grade(assignments_dir, assignment_num): query_data = _conn.cursor() query_data.execute(""" SELECT expected_output FROM assignments WHERE num = ? """, [assignment_num]) expected_output = query_data.fetchone()[0] for assignment in os.listdir(assignments_dir): (student_id, ext) = os.path.splitext(assignment) code = imp.load_source('test', assignments_dir + '/' + assignment) grade = 100 if code.run_assignment() == expected_output else 0 insert_grade(student_id, assignment_num, grade)
Printing Grades Weaknesses: def print_grades(): cur = _conn.cursor() cur.execute(""" SELECT name as student_name, assignment_num, grade FROM students INNER JOIN grades ON students.id = grades.student_id """) print 'grades:' for row in cur.fetchall(): print 'grade of student {} on assignment {} is {}'.format(*row) Weaknesses: Database creation, manipulation and access are coupled with the program logic. Any database change will require us to go over every function to make sure they are correct Solution: Add a layer between the database and the program logic, any database change will force us to modify the persistence layer itself only, without modifying the program logic. This is because the program logic is much larger than persistence layer. By separating database related functions from the program logic, we organize our code for easier modification as needed.
With Persistence Layer: Data Transfer Objects Database Transfer Objects: [the data containers] One class per database table: Student, Assignment, Grade These are objects that are passed from and to the persistence layer class Student(object): def __init__(self, id, name): self.id = id self.name = name class Assignment(object): def __init__(self, num, expected_output): self.num = num self.expected_output = expected_output class Grade(object): def __init__(self, student_id, assignment_num, grade): self.student_id = student_id self.assignment_num = assignment_num self.grade = grade
With Persistence Layer: Data Access Objects Data Access Objects: [the database manipulation functions] Each contains the commands and queries logic for one data transfer class Commands: used to update the database using the values of the data transfer objects Queries: used to save the results retrieved from the database using the data access objects Three Data Access Classes: Students: insert a new student, find specific using its id Assignments: insert a new assignment, find assignment using its number Grades: insert a new grade, find_all grades of all students
Data Access Objects: Students class _Students: def __init__(self, conn): self._conn = conn def insert(self, student): self._conn.execute(""" INSERT INTO students (id, name) VALUES (?, ?) """, [student.id, student.name]) def find(self, student_id): c = self._conn.cursor() c.execute(""" SELECT id, name FROM students WHERE id = ? """, [student_id]) return Student(*c.fetchone())
Data Access Objects: Assignments class _Assignments: def __init__(self, conn): self._conn = conn def insert(self, assignment): self._conn.execute(""" INSERT INTO assignments (num, expected_output) VALUES (?, ?) """, [assignment.num, assignment.expected_output]) def find(self, num): c = self._conn.cursor() c.execute(""" SELECT num,expected_output FROM assignments WHERE num = ? """, [num]) return Assignment(*c.fetchone())
Data Access Objects: Grades class _Grades: def __init__(self, conn): self._conn = conn def insert(self, grade): self._conn.execute(""" INSERT INTO grades (student_id, assignment_num, grade) VALUES (?, ?, ?) """, [grade.student_id, grade.assignment_num, grade.grade]) def find_all(self): c = self._conn.cursor() all = c.execute(""" SELECT student_id, assignment_num, grade FROM grades """).fetchall() return [Grade(*row) for row in all]
With Persistence Layer: Repositories Repository is the access point to the database logic class _Repository(object): def __init__(self): self._connection = sqlite3.connect('grades.db') self.students = _Students(self._conn) self.assignments = _Assignments(self._conn) self.grades = _Grades(self._conn) def _close(self): self._connection.commit() self._connection.close() def create_tables(self): # implementation in next two slides # the repository singleton repository = _Repository()
Creating the tables def create_tables(): _connection.executescript(""" CREATE TABLE students ( id INT PRIMARY KEY, name TEXT NOT NULL ); CREATE TABLE assignments ( num INT PRIMARY KEY, expected_output TEXT NOT NULL
Creating the tables: continued CREATE TABLE grades ( student_id INT NOT NULL, assignment_num INT NOT NULL, grade INT NOT NULL, FOREIGN KEY(student_id) REFERENCES students(id), FOREIGN KEY(assignment_num) REFERENCES assignments(num), PRIMARY KEY (student_id, assignment_num) ); """)
With Persistence Layer: Adding Grades def grade(assignments_dir, assignment_num): expected_output = repository.assignments.find(assignment_num).expected_output for assignment in os.listdir(assignments_dir): (student_id, ext) = os.path.splitext(assignment) code = imp.load_source('test', assignments_dir + '/' + assignment) student_grade = Grade(student_id, assignment_num, 0) if code.run_assignment() == expected_output: student_grade.grade = 100 repository.grades.insert(student_grade) No SQL code found here! All SQL code is found in data access and data storage classes Far away from application logic. Application logic uses functions only!
With Persistence Layer: Printing Grades def print_grades(): print 'grades:' for grade in repository.grades.find_all(): student = repository.students.find(grade.student_id) print 'grade of student {} on assignment {} is {}'\ .format(student.name, grade.assignment_num, grade.grade)
Generalization: Making the Code Generic Code Repetition: find(), and Insert() methods generally look the same - on different tables Solution: Object Rational Mapping – ORM A generic class that converts data to its corresponding data transfer object Then we can use it to create a generic data access object class
Code Repetition: Students class _Students: def __init__(self, conn): self._conn = conn def insert(self, student): self._conn.execute(""" INSERT INTO students (id, name) VALUES (?, ?) """, [student.id, student.name]) def find(self, student_id): c = self._conn.cursor() c.execute(""" SELECT id, name FROM students WHERE id = ? """, [student_id]) return Student(*c.fetchone())
Code Repetition : Assignments class _Assignments: def __init__(self, conn): self._conn = conn def insert(self, assignment): self._conn.execute(""" INSERT INTO assignments (num, expected_output) VALUES (?, ?) """, [assignment.num, assignment.expected_output]) def find(self, num): c = self._conn.cursor() c.execute(""" SELECT num,expected_output FROM assignments WHERE num = ? """, [num]) return Assignment(*c.fetchone())
Code Repetition : Grades class _Grades: def __init__(self, conn): self._conn = conn def insert(self, grade): self._conn.execute(""" INSERT INTO grades (student_id, assignment_num, grade) VALUES (?, ?, ?) """, [grade.student_id, grade.assignment_num, grade.grade]) def find_all(self): c = self._conn.cursor() all = c.execute(""" SELECT student_id, assignment_num, grade FROM grades """).fetchall() return [Grade(*row) for row in all]
Object Rational Mapping: Requirements Each data transfer object class represents a single table A class named Student represents a table named students Each data transfer object class contains a constructor that accepts all their fields The name of the constructor arguments is the same as the name of the fields The name of the fields of each DTO class is the same as the column names in the database
Object Rational Mapping: Implementation Input Arguments: a cursor Data transfer object type Process: Finds the constructor arguments of the data transfer object Finds the column names inside the cursor Create a mapping array col_mapping For each constructor argument i, col_mapping[i] is the index of the corresponding column inside the database Loop over the data inside the cursor to construct one DTO object per data row Done by using col_mapping array
Object Rational Mapping: Implementation Code def orm(cursor, dto_type): # retrieve the argument names of the constructor args = inspect.getargspec(dto_type.__init__).args #ignore ‘self’ element args = args[1:] #gets the names of the columns returned in the cursor col_names = [column[0] for column in cursor.description] #map them into the position of the corresponding constructor argument col_mapping = [col_names.index(arg) for arg in args] return [row_map(row, col_mapping, dto_type) for row in cursor.fetchall()] def row_map(row, col_mapping, dto_type): ctor_args = [row[idx] for idx in col_mapping] return dto_type(*ctor_args)
Generic Data Access Object: Class and Constructor class Dao(object): def __init__(self, dto_type, connection): self._connection = connection self._dto_type = dto_type #dto_type is a class, its __name__ field contains the class name self._table_name = dto_type.__name__.lower() + 's' Each instance of the generic Dao class wraps one data transfer object to provide it with the needed access functions
Generic Data Access Object: insert, find_all def insert(self, dto_instance): ins_dict = vars(dto_instance) column_names = ','.join(ins_dict.keys()) parameterss = ins_dict.values() question_marks = ','.join(['?'] * len(ins_dict)) statement = 'INSERT INTO {} ({}) VALUES ({})'\ .format(self._table_name, column_names, question_marks) self._connection.execute(statement, parameters) def find_all(self): c = self._conn.cursor() c.execute('SELECT * FROM {}'.format(self._table_name)) return orm(c, self._dto_type)
Generic Data Access Object: find def find(self, **keyvals): column_names = keyvals.keys() parameters = keyvals.values() statement = 'SELECT * FROM {} WHERE {}' \ .format(self._table_name, ' AND '.join([col + '=?' for col in column_names])) cursor = self._connection.cursor() cursor.execute(statement, params) return orm(cursor, self._dto_type) Line 1: adding ** to a variable makes it the function accept a list of key=value parameters: Example: def foo(dict): print dict def cool_foo(**dict): foo({'a':1, 'b':2}) #prints {'a':1, 'b':2} cool_foo(a=1, b=2) #prints {'a':1, 'b':2}
Generic Code: Repository class Repository(object): def __init__(self): self._conn = sqlite3.connect('grades.db') self._conn.text_factory = bytes self.students = Dao(Student, self._conn) self.assignments = Dao(Assignment, self._conn) self.grades = Dao(Grade, self._conn) Instead of having three Dao classes, we have one generic Dao class This class wraps any dto class – and used to access the dto