Download presentation
Presentation is loading. Please wait.
Published byGary Day Modified over 9 years ago
1
Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736
2
Leveraging DBs in File Systems What do databases have to offer? Transactions Concurrency control Crash recovery Query power (metadata) Extensibility – add new objects/modules Efficient Search!
3
Re-thinking Directories Current state of directories: User remembers what, not where Our System: Search tools for grouping related files Semantically meaningful directories [Semantic FS] Files are stored in tables Directories are just for looks LAME!
4
Related Work Semantic Filesystems Use a DB [Inversion Filesystem] NFS Meets Databases [Halverson] NFS for portability, transparency, existing code support, familiar semantics Server-side caching for performance Bringing ideas together: Use [Halverson]’s infrastructure to implement semantic filesystem ideas
5
Roadmap Overview of System Design and Implementation Virtual Directories and Full-Text Queries Live Demonstration Conclusions & Future Work
6
System Architecture Standard NFS Clients: Client NFS Server: NFS Front End Custom Backend... Object-Relational Database: Storage MTS2 Storage TS2MMM
7
Postgres Capabilities An object-relational DB such as Postgres lets you define and add modules. Case in point: Tsearch2 New type: tsvector Related function: to_tsvector to_tsvector(‘a b a c'); ‘a':1,3 ‘b':2 ‘c':4 Related index:idxFTI Set triggers to do updates
8
Mapping FS data to DB Schema Filesystem DataDatabase Tables Metadatafileatt Directory Structurenaming Non-indexed File Content allfiles Indexed File Content allfiles_txt
9
[Halverson] Schema inodeuidgidmodenlinkssizectimemtimeatime fileatt inodenameparent naming inodechunk_iddata allfiles 11 N N 1 N
10
Database Schema inodeuidgidmodenlinkssizectimemtimeatimeistext fileatt inodenameparent naming inodechunk_iddata allfiles 11 N N 1 N strstr(a,”.txt”)
11
Database Schema inodeuidgidmodenlinkssizectimemtimeatimeistext fileatt inodenameparent naming inodechunk_iddata allfiles 11 N N inodefulltexttsvector allfiles_txt 1 1 1 N tsearch2 index strstr(a,”.txt”)
12
Roadmap Overview of System Design and Implementation Virtual Directories and Full-Text Queries Live Demonstration Conclusions & Future Work
13
Virtual Directories and Text Search Want to handle 2 types of text queries Boolean keyword queries e.g. (‘Kristen’ | ‘Kevin’ | ‘Remzi’) & ‘file’ & ‘system’ IR rank queries e.g. Rank files with respect to (‘computer’ & ‘architecture’) More powerful than grep! Virtual directories proposed for Semantic File systems Incorporate full-text queries without “breaking” NFS interface for existing applications
14
DBMS Full-Text Support Keyword Search Text indices support search over keywords Words extracted from document, stemmed, “stopwords” removed Rank Used existing rank() function as a black-box rank() counts number of times each word appears in document, and whether search terms are near one another Optionally, normalize by document length Other notions of IR rank could easily be substituted
15
Semantics of Virtual Directories Encountered some tradeoffs What we did: Static virtual directories (search once on mkdir) Directory contents as a snapshot at one point in time Hard links /CS736 project papers reading questions %nfs% writeu p NFS talk outline NFS vs AFS Thread ideas
16
Semantics of Virtual Directories Encountered some tradeoffs Alternatives (all also valid): Static virtual directory creation with symbolic links leads to dangling (broken) links Process query lazily on readdir command Semantics used in Semantic File System paper Dynamically update contents of virtual directories on file creation, deletion, or write Can be implemented using database triggers More expensive, heavier back-end load
17
Roadmap Overview of System Design and Implementation Virtual Directories and Full-Text Queries Live Demonstration Conclusions & Future Work
18
Roadmap Overview of System Design and Implementation Virtual Directories and Full-Text Queries Live Demonstration Conclusions & Future Work
19
Conclusions Benefits of our proxy architecture: Standard NFS clients Postgres as black box Simple to expose functionality of DB Use & add DB objects at will
20
Future Work Performance evaluation to understand the overhead of new functionality Dynamic index maintenance (file creation & modification) Virtual directory creation and text querying Block-level text writes and caching Query support for other file types Mechanisms for extracting and indexing meta-data from additional file types (e.g., image files) Performance Monitoring, Adaptive Indexing and storage format within the NFS Proxy
21
Thanks! Questions? Special Thanks: Remzi Arpaci-Dusseau Alan Halverson David DeWitt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.