What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document retrieval is important e.g. Web search! But researchers work on a range of retrieval-related language technologies: question answering cross-lingual retrieval distributed retrieval topic detection and tracking multimedia retrieval summarization
IR and Database Systems Typically differentiated by unstructured/structured data What about marked-up text and semi-structured data? Recent database papers on nearest-neighbor and similarity search distributed, peer-to-peer search Web search information extraction text data mining Boundaries continue to get fuzzier
IR and Database Systems Many proposals for database/IR integration most recently in XML context, but goes back to the 70s supporting a probabilistic framework is key Integration vs. Cooperation Semantic Web lessons from the IR world semantics or statistics?