2003.10.30 - SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Text Similarity David Kauchak CS457 Fall 2011.
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 4.
Information Retrieval in Practice
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
9/11/2000Information Organization and Retrieval Content Analysis and Statistical Properties of Text Ray Larson & Marti Hearst University of California,
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
9/18/2001Information Organization and Retrieval Vector Representation, Term Weights and Clustering (continued) Ray Larson & Warren Sack University of California,
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Multimedia and Text Indexing. Multimedia Data Management The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video.
9/11/2001Information Organization and Retrieval Content Analysis and Statistical Properties of Text Ray Larson & Warren Sack University of California,
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Ch 4: Information Retrieval and Text Mining
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
DOK 324: Principles of Information Retrieval Hacettepe University Department of Information Management.
9/13/2001Information Organization and Retrieval Vector Representation, Term Weights and Clustering Ray Larson & Warren Sack University of California, Berkeley.
8/28/97Information Organization and Retrieval IR Implementation Issues, Web Crawlers and Web Search Engines University of California, Berkeley School of.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
9/14/2000Information Organization and Retrieval Vector Representation, Term Weights and Clustering Ray Larson & Marti Hearst University of California,
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
9/19/2000Information Organization and Retrieval Vector and Probabilistic Ranking Ray Larson & Marti Hearst University of California, Berkeley School of.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Advanced Multimedia Text Classification Tamara Berg.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Web- and Multimedia-based Information Systems Lecture 2.
Vector Space Models.
1 Information Retrieval LECTURE 1 : Introduction.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Natural Language Processing Topics in Information Retrieval August, 2002.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
SIMS 202, Marti Hearst Content Analysis Prof. Marti Hearst SIMS 202, Lecture 15.
Automated Information Retrieval
Information Retrieval in Practice
Why indexing? For efficient searching of a document
Plan for Today’s Lecture(s)
Text Based Information Retrieval
Why the interest in Queries?
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Representation of documents and queries
Text Categorization Assigning documents to a fixed set of categories
From frequency to meaning: vector space models of semantics
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
Content Analysis of Text
4. Boolean and Vector Space Retrieval Models
Presentation transcript:

SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall SIMS 202: Information Organization and Retrieval Lecture 18: Statistical Properties of Texts and Vector Representation

SLIDE 2IS 202 – FALL 2003 Lecture Overview Review –Boolean Searching –Content Analysis Statistical Properties of Text –Zipf Distribution –Statistical Dependence Indexing and Inverted Files Vector Representation Term Weights Vector Matching Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 3IS 202 – FALL 2003 Lecture Overview Review –Boolean Searching –Content Analysis Statistical Properties of Text –Zipf Distribution –Statistical Dependence Indexing and Inverted Files Vector Representation Term Weights Vector Matching Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 4IS 202 – FALL 2003 Boolean Queries Cat Cat OR Dog Cat AND Dog (Cat AND Dog) (Cat AND Dog) OR Collar (Cat AND Dog) OR (Collar AND Leash) (Cat OR Dog) AND (Collar OR Leash)

SLIDE 5IS 202 – FALL 2003 Boolean Logic AB

SLIDE 6IS 202 – FALL 2003 Boolean Logic 3t33t3 1t11t1 2t22t2 1D11D1 2D22D2 3D33D3 4D44D4 5D55D5 6D66D6 8D88D8 7D77D7 9D99D9 10 D D 11 m1m1 m2m2 m3m3 m5m5 m4m4 m7m7 m8m8 m6m6 m 2 = t 1 t 2 t 3 m 1 = t 1 t 2 t 3 m 4 = t 1 t 2 t 3 m 3 = t 1 t 2 t 3 m 6 = t 1 t 2 t 3 m 5 = t 1 t 2 t 3 m 8 = t 1 t 2 t 3 m 7 = t 1 t 2 t 3

SLIDE 7IS 202 – FALL 2003 Boolean Systems Most of the commercial database search systems that pre-date the WWW are based on Boolean search –Dialog, Lexis-Nexis, etc. Most Online Library Catalogs are Boolean systems –E.g., MELVYL Database systems use Boolean logic for searching Many of the search engines sold for intranet search of web sites are Boolean

SLIDE 8IS 202 – FALL 2003 Content Analysis Automated Transformation of raw text into a form that represents some aspect(s) of its meaning Including, but not limited to: –Automated Thesaurus Generation –Phrase Detection –Categorization –Clustering –Summarization

SLIDE 9IS 202 – FALL 2003 Techniques for Content Analysis Statistical –Single Document –Full Collection Linguistic –Syntactic –Semantic –Pragmatic Knowledge-Based (Artificial Intelligence) Hybrid (Combinations)

SLIDE 10IS 202 – FALL 2003 Text Processing Standard Steps: –Recognize document structure Titles, sections, paragraphs, etc. –Break into tokens Usually space and punctuation delineated Special issues with Asian languages –Stemming/morphological analysis –Store in inverted index (to be discussed later)

SLIDE 11IS 202 – FALL 2003 Techniques for Content Analysis Statistical –Single Document –Full Collection Linguistic –Syntactic –Semantic –Pragmatic Knowledge-Based (Artificial Intelligence) Hybrid (Combinations)

SLIDE 12 Document Processing Steps From “Modern IR” Textbook

SLIDE 13IS 202 – FALL 2003 Errors Generated by Porter Stemmer From Krovetz ‘93

SLIDE 14IS 202 – FALL 2003 Lecture Overview Review –Boolean Searching –Content Analysis Statistical Properties of Text –Zipf Distribution –Statistical Dependence Indexing and Inverted Files Vector Representation Term Weights Vector Matching Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 15IS 202 – FALL 2003 A Small Collection (Stems) Rank Freq Term 1 37 system 2 32 knowledg 3 24 base 4 20 problem 5 18 abstract 6 15 model 7 15 languag 8 15 implem 9 13 reason inform expert analysi rule program oper evalu comput case 19 9 gener 20 9 form enhanc energi emphasi detect desir date critic content consider concern compon compar commerci clause aspect area aim affect

SLIDE 16IS 202 – FALL 2003 The Corresponding Zipf Curve Rank Freq 1 37 system 2 32 knowledg 3 24 base 4 20 problem 5 18 abstract 6 15 model 7 15 languag 8 15 implem 9 13 reason inform expert analysi rule program oper evalu comput case 19 9 gener 20 9 form

SLIDE 17IS 202 – FALL 2003 Zipf Distribution The Important Points: –A few elements occur very frequently –A medium number of elements have medium frequency –Many elements occur very infrequently

SLIDE 18 Zipf Distribution Linear ScaleLogarithmic Scale

SLIDE 19IS 202 – FALL 2003 Related Distributions/”Laws” Bradford’s Law of Scattering Lotka’s Law of Productivity De Solla Price’s Urn Model for “Cumulative Advantage Processes” ½ = 50%2/3 = 66%¾ = 75%Pick Replace +1

SLIDE 20IS 202 – FALL 2003 Frequent Words on the WWW the a to of and in s for on this is by with or at all are from e you be that not an as home it i have if new t your page about com information will can more has no other one c d m was copyright us (see

SLIDE 21IS 202 – FALL 2003 Word Frequency vs. Resolving Power The most frequent words are not the most descriptive (from van Rijsbergen 79)

SLIDE 22IS 202 – FALL 2003 Statistical Independence Two events x and y are statistically independent if the product of the probabilities of their happening individually equals the probability of their happening together

SLIDE 23IS 202 – FALL 2003 Lexical Associations Subjects write first word that comes to mind –doctor/nurse; black/white (Palermo & Jenkins 64) Text Corpora can yield similar associations One measure: Mutual Information (Church and Hanks 89) If word occurrences were independent, the numerator and denominator would be equal (if measured across a large collection)

SLIDE 24IS 202 – FALL 2003 Interesting Associations with “Doctor” AP Corpus, N=15 million, Church & Hanks 89

SLIDE 25IS 202 – FALL 2003 These associations were likely to happen because the non-doctor words shown here are very common and therefore likely to co-occur with any noun Un-Interesting Associations with “Doctor” AP Corpus, N=15 million, Church & Hanks 89

SLIDE 26IS 202 – FALL 2003 Content Analysis Summary Content Analysis: transforming raw text into more computationally useful forms Words in text collections exhibit interesting statistical properties –Word frequencies have a Zipf distribution –Word co-occurrences exhibit dependencies

SLIDE 27IS 202 – FALL 2003 Lecture Overview Review –Boolean Searching –Content Analysis Statistical Properties of Text –Zipf Distribution –Statistical Dependence Indexing and Inverted Files Vector Representation Term Weights Vector Matching Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 28IS 202 – FALL 2003 Inverted Indexes We have seen “Vector files” conceptually –An Inverted File is a vector file “inverted” so that rows become columns and columns become rows

SLIDE 29IS 202 – FALL 2003 How Inverted Files are Created Dictionary Postings

SLIDE 30IS 202 – FALL 2003 Inverted Indexes Permit fast search for individual terms For each term, you get a list consisting of: –Document ID –Frequency of term in doc (optional) –Position of term in doc (optional) These lists can be used to solve Boolean queries: country -> d1, d2 manor -> d2 country AND manor -> d2 Also used for statistical ranking algorithms

SLIDE 31IS 202 – FALL 2003 How Inverted Files are Used Dictionary Postings Query on “time” AND “dark” 2 docs with “time” in dictionary -> IDs 1 and 2 from posting file 1 doc with “dark” in dictionary -> ID 2 from posting file Therefore, only doc 2 satisfied the query

SLIDE 32IS 202 – FALL 2003 Lecture Overview Review –Boolean Searching –Content Analysis Statistical Properties of Text –Zipf Distribution –Statistical Dependence Indexing and Inverted Files Vector Representation Term Weights Vector Matching Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 33IS 202 – FALL 2003 Document Vectors Documents are represented as “bags of words” Represented as vectors when used computationally –A vector is like an array of floating point –Has direction and magnitude –Each vector holds a place for every term in the collection –Therefore, most vectors are sparse

SLIDE 34IS 202 – FALL 2003 Vector Space Model Documents are represented as vectors in term space –Terms are usually stems –Documents represented by binary or weighted vectors of terms Queries represented the same as documents Query and Document weights are based on length and direction of their vector A vector distance measure between the query and documents is used to rank retrieved documents

SLIDE 35IS 202 – FALL 2003 Vector Representation Documents and Queries are represented as vectors Position 1 corresponds to term 1, position 2 to term 2, position t to term t The weight of the term is stored in each position

SLIDE 36IS 202 – FALL 2003 Document Vectors “Nova” occurs 10 times in text A “Galaxy” occurs 5 times in text A “Heat” occurs 3 times in text A (Blank means 0 occurrences.)

SLIDE 37IS 202 – FALL 2003 Document Vectors “Hollywood” occurs 7 times in text I “Film” occurs 5 times in text I “Diet” occurs 1 time in text I “Fur” occurs 3 times in text I

SLIDE 38IS 202 – FALL 2003 Document Vectors

SLIDE 39IS 202 – FALL 2003 We Can Plot the Vectors Star Diet Doc about astronomy Doc about movie stars Doc about mammal behavior

SLIDE 40IS 202 – FALL 2003 Documents in 3D Space Primary assumption of the Vector Space Model: Documents that are “close together” in space are similar in meaning

SLIDE 41IS 202 – FALL 2003 Vector Space Documents and Queries D1D1 D2D2 D3D3 D4D4 D5D5 D6D6 D7D7 D8D8 D9D9 D 10 D 11 t2t2 t3t3 t1t1 Boolean term combinations Q is a query – also represented as a vector

SLIDE 42IS 202 – FALL 2003 Documents in Vector Space t1t1 t2t2 t3t3 D1D1 D2D2 D 10 D3D3 D9D9 D4D4 D7D7 D8D8 D5D5 D 11 D6D6

SLIDE 43IS 202 – FALL 2003 Lecture Overview Review –Boolean Searching –Content Analysis Statistical Properties of Text –Zipf Distribution –Statistical Dependence Indexing and Inverted Files Vector Representation Term Weights Vector Matching Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 44IS 202 – FALL 2003 Assigning Weights to Terms Binary Weights Raw term frequency tf*idf –Recall the Zipf distribution –Want to weight terms highly if they are Frequent in relevant documents … BUT Infrequent in the collection as a whole Automatically derived thesaurus terms

SLIDE 45IS 202 – FALL 2003 Binary Weights Only the presence (1) or absence (0) of a term is included in the vector

SLIDE 46IS 202 – FALL 2003 Raw Term Weights The frequency of occurrence for the term in each document is included in the vector

SLIDE 47IS 202 – FALL 2003 Assigning Weights tf*idf measure: –Term frequency (tf) –Inverse document frequency (idf) A way to deal with some of the problems of the Zipf distribution Goal: Assign a tf*idf weight to each term in each document

SLIDE 48IS 202 – FALL 2003 tf*idf

SLIDE 49IS 202 – FALL 2003 Inverse Document Frequency IDF provides high values for rare words and low values for common words For a collection of documents (N = 10000)

SLIDE 50IS 202 – FALL 2003 Lecture Overview Review –Boolean Searching –Content Analysis Statistical Properties of Text –Zipf Distribution –Statistical Dependence Indexing and Inverted Files Vector Representation Term Weights Vector Matching Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 51IS 202 – FALL 2003 Similarity Measures Simple matching (coordination level match) Dice’s Coefficient Jaccard’s Coefficient Cosine Coefficient Overlap Coefficient

SLIDE 52IS 202 – FALL 2003 tf*idf Normalization Normalize the term weights (so longer vectors are not unfairly given more weight) –Normalize usually means force all values to fall within a certain range, usually between 0 and 1, inclusive

SLIDE 53IS 202 – FALL 2003 Vector Space Similarity Now, the similarity of two documents is: This is also called the cosine, or normalized inner product –The normalization was done when weighting the terms

SLIDE 54IS 202 – FALL 2003 Vector Space Similarity Measure Combine tf and idf into a similarity measure

SLIDE 55IS 202 – FALL 2003 Computing Similarity Scores

SLIDE 56IS 202 – FALL 2003 What’s Cosine Anyway? “One of the basic trigonometric functions encountered in trigonometry. Let theta be an angle measured counterclockwise from the x-axis along the arc of the unit circle. Then cos(theta) is the horizontal coordinate of the arc endpoint. As a result of this definition, the cosine function is periodic with period 2pi.” From

SLIDE 57IS 202 – FALL 2003 Cosine vs. Degrees CosineCosine Degrees

SLIDE 58IS 202 – FALL 2003 Computing a Similarity Score

SLIDE 59IS 202 – FALL 2003 Vector Space Matching D2D2 D1D1 Q Term B Term A D i =(d i1,w di1 ;d i2, w di2 ;…;d it, w dit ) Q =(q i1,w qi1 ;q i2, w qi2 ;…;q it, w qit ) Q = (0.4,0.8) D1=(0.8,0.3) D2=(0.2,0.7)

SLIDE 60IS 202 – FALL 2003 Weighting Schemes We have seen something of –Binary –Raw term weights –TF*IDF There are many other possibilities –IDF alone –Normalized term frequency

SLIDE 61IS 202 – FALL 2003 Document Space Has High Dimensionality What happens beyond 2 or 3 dimensions? Similarity still has to do with how many tokens are shared in common More terms -> harder to understand which subsets of words are shared among similar documents One approach to handling high dimensionality: Clustering

SLIDE 62IS 202 – FALL 2003 Vector Space Visualization

SLIDE 63IS 202 – FALL 2003 Text Clustering Finds overall similarities among groups of documents Finds overall similarities among groups of tokens Picks out some themes, ignores others

SLIDE 64IS 202 – FALL 2003 Text Clustering Clustering is “The art of finding groups in data.” -- Kaufmann and Rousseau Term 1 Term 2

SLIDE 65IS 202 – FALL 2003 Scatter/Gather Cutting, Pedersen, Tukey & Karger 92, 93, Hearst & Pedersen 95 Cluster sets of documents into general “themes”, like a table of contents Display the contents of the clusters by showing topical terms and typical titles User chooses subsets of the clusters and re-clusters the documents within Resulting new groups have different “themes”

SLIDE 66IS 202 – FALL 2003 S/G Example: Query on “star” Encyclopedia text 14 sports 8 symbols47 film, tv 68 film, tv (p) 7 music 97 astrophysics 67 astronomy(p)12 stellar phenomena 10 flora/fauna49 galaxies, stars 29 constellations 7 miscelleneous Clustering and re-clustering is entirely automated

SLIDE 70IS 202 – FALL 2003 Clustering Result Sets Advantages: –See some main themes Disadvantage: –Many ways documents could group together are hidden Thinking point: What is the relationship to classification systems and facets?

SLIDE 71IS 202 – FALL 2003 Dan Perkel on Cooper Are the problems that Cooper lays out the most pressing ones that web users face today? If not, what are some more pressing problems? Who are Cooper’s users? Regardless of answer to previous question, how adequate are his solutions? Where are strengths and weaknesses?

SLIDE 72IS 202 – FALL 2003 Simon King on Hearst Prof. Hearst mentions "an algorithm called TextTiling that automatically splits long documents into multi-paragraph subtopical units." Sounds nice, but what if the termsets/concepts you're searching on just happen to appear on opposite sides of one of the boundaries that TextTiling created? In plans for future work she mentions using an inverse distance measure rather than a fixed proximity constraint. This is good unless your search terms appear at the end of one section of a document and the beginning of the next (they're not separated by many words, but may not be related within the document.) Is one of these approaches clearly better than the other?

SLIDE 73IS 202 – FALL 2003 Simon King on Hearst Is there any reason that the optimal query size for Hearst's queries seems to be two or three concepts? Is this due to the way we write and think -- can't discuss more than a couple ideas at a time? Or is there some other reason?

SLIDE 74IS 202 – FALL 2003 Sean Savage on MIR 7 Considering these trends: –the proliferation of networked, mobile devices used at the front end in everyday information retrieval scenarios, and –the increase in cheap processing power and memory on the back end; And considering these facts: –mobile devices possess very limited input and output capabilities compared to those of desktop machines; and –most usage scenarios beyond the desktop involve significant constraints on the amount of time and attention that users can devote to these devices.

SLIDE 75IS 202 – FALL 2003 Sean Savage on MIR 7 Should we now focus most of our development resources in the realm of large-scale text transformations on improving the quality of search results (i.e., striving to improve precision and recall by pre-processing text, and by using categorization hierarchies at the front end to guide users in focusing queries), as opposed to directing those resources towards even more effective compression techniques to reduce query response times?

SLIDE 76IS 202 – FALL 2003 Sean Savage on MIR 7 Given the trends and facts above, should those in the IR community who work on text compression now focus on compressing small batches of text to be transmitted more efficiently across wireless networks, rather than on compressing the gigantic collections residing in databases, which this chapter seems to chiefly address?

SLIDE 77IS 202 – FALL 2003 Next Time Avi Rappoport of Searchtools.com on “Implementing Web Site Search Engines” –For discussion, please prepare by looking at some web sites with search capabilities (but NOT Ebay, Amazon, Google, Yahoo, or AllTheWeb) and find one that you like and one that you don’t Ray will be away from Tuesday-Friday next week, Marc will be in town, but at a conference all next week