Presentation is loading. Please wait.

Presentation is loading. Please wait.

DLLS 20031 Ontologically-based Searching for Jobs in Linguistics Deryle Lonsdale Funded by:

Similar presentations

Presentation on theme: "DLLS 20031 Ontologically-based Searching for Jobs in Linguistics Deryle Lonsdale Funded by:"— Presentation transcript:

1 DLLS 20031 Ontologically-based Searching for Jobs in Linguistics Deryle Lonsdale Funded by:

2 DLLS 20032 The BYU Data Extraction Group Group of faculty (5) and students (15) from CS, Linguistics, SOAIS Goal: ontology-based data extraction NSF funding: CISE/IIS/IDM TIDIE Website: Papers, presentations Tools Demos

3 DLLS 20033 The BYU Data Extraction Group

4 DLLS 20034 Overview Ontology-based extraction Building knowledge sources Jobs in linguistics (Sproat) Putting it all together Some sample results

5 DLLS 20035 Ontologies and IE SourceTarget

6 DLLS 20036 Document-based IE

7 DLLS 20037 Conceptual modeling (OSM) YearPrice Make Mileage Model Feature PhoneNr Extension Car has is for has 1..* 0..1 1..* 0..1 0..* 1..*

8 DLLS 20038 Recognition and Extraction Car Year Make Model Mileage Price PhoneNr 0001 1989 Subaru SW $1900 (336)835-8597 0002 1998 Elantra (336)526-5444 0003 1994 HONDA ACCORD EX 100K (336)526-1081 Car Feature 0001 Auto 0001 AC 0002 Black 0002 4 door 0002 tinted windows 0002 Auto 0002 pb 0002 ps 0002 cruise 0002 am/fm 0002 cassette stereo 0002 a/c 0003 Auto 0003 jade green 0003 gold

9 DLLS 20039 Car-Ads Ontology (textual) Car [->object]; Car [0..1] has Year [1..*]; Car [0..1] has Make [1..*]; Car [0...1] has Model [1..*]; Car [0..1] has Mileage [1..*]; Car [0..*] has Feature [1..*]; Car [0..1] has Price [1..*]; PhoneNr [1..*] is for Car [0..*]; PhoneNr [0..1] has Extension [1..*]; Year matches [4] constant {extract “\d{2}”; context "([^\$\d]|^)[4-9]\d[^\d]"; substitute "^" -> "19"; }, … End;

10 DLLS 200310 The data-frame library Low-level patterns implemented as regular expressions Match items such as email addresses, phone numbers, names, etc. Mileage matches [8] constant { extract "\b[1-9]\d{0,2}k"; substitute "[kK]" -> "000"; }, { extract "[1-9]\d{0,2}?,\d{3}"; context "[^\$\d][1-9]\d{0,2}?,\d{3}[^\d]"; substitute "," -> "";}, { extract "[1-9]\d{0,2}?,\d{3}"; context "(mileage\:\s*)[^\$\d][1-9]\d{0,2}?,\d{3}[^\d]"; substitute "," -> "";}, { extract "[1-9]\d{3,6}"; context "[^\$\d][1-9]\d{3,6}\s*mi(\.|\b\les\b)";}, { extract "[1-9]\d{3,6}"; context "(mileage\:\s*)[^\$\d][1-9]\d{3,6}\b";}; keyword "\bmiles\b", "\bmi\.", "\bmi\b", "\bmileage\b"; end;

11 DLLS 200311 Lexicons Repositories of enumerable classes of lexical information FirstNames, LastNames, USstates, ProvoOremApts, CarMakes, Drugs, CampGroundFeats, etc.

12 DLLS 200312 Accessing the output Extracted information is stored in a relational database Results can be queried using SQL Wide range of views is possible

13 DLLS 200313 Finding jobs in linguistics, LSA Email distribution lists (corpora, langage naturelle, CAAL/ACLA, etc.) Usual commercial sites (,, Word-of-mouth sources

14 DLLS 200314 Sproat’s analysis Random sample (224/2250) of LinguistList postings, 1994-2001 Development vs. research, academic vs. industrial Linguists are most often (approx. 80% of the time) offered development jobs Linguists hired more for specific tasks (e.g. grammar, lexicon development) rather than for more general research-oriented tasks (e.g. creating new technological approaches.)

15 DLLS 200315 The banner years Year Academia Industry % Industry 1994 27 2 7% 1995 45 5 10% 1996 52 3 5% 1997 48 3 6% 1998 57 3 5% 1999 56 14 20% 2000 55 43 39% 2001 (mid) 22 10 31%  Dramatic rise in 1999, 2000  Steep drop-off since 2001  Rising demand for technical, computational skills

16 DLLS 200316 Linguistic jobs ontology Why? user-specifiable constraints Somewhat closely follows existing ontologies (e.g. jobs, software)

17 DLLS 200317 Data frames and lexicons Language names ethnologue (sub)fields of linguistics Tools, toolkits Software components, programming languages Linguistics-related job titles Activities Responsibilities Country names

18 DLLS 200318 The corpus 3237 postings (LinguistList, Corpora, LN, WoM): 1998 541 1999 575 2000 871 2001 952 2002 788 Some noise (non-English, factored, program descriptions, attachments, etc.) Semi-automatic edits (boilerplate, publicity blurbs about institutions, etc.)

19 DLLS 200319 Sample output Here

20 DLLS 200320 Observations 270 don’t have linguist* (!) Demand for knowledge of English equals that for all other languages combined (G, F, S, J, C) Computer/computational background required for almost 1/3 (1116) Noticeable amount of headhunting, particularly in Seattle, DC areas

21 DLLS 200321 Programming languages

22 DLLS 200322 Popular subfields

23 DLLS 200323 Subfields (another perspective)

24 DLLS 200324 An engineering discipline? 160 linguistics jobs ending in “engineer” Software development cycle research e., software design e. development e., software e. software quality e., linguistic test e., linguistic quality e. linguistic support e., user experience e. presales e., technical sales e. Specific subfields web site e. speech e., voice recognition e., speech recognition application e., speech e., ASR tuning e., audio e. dialog e. tools e. AI e., NLP e. knowledge e. linguist e., natural language e. staff e. human factors e., user interface e.

25 DLLS 200325 Paradigms

26 DLLS 200326 Other observations Often a job title is not even listed (!) More in18 of data frames (e.g. email, ph. #) Great need for (preferably hierarchical) lexical repositories related to linguistics job titles theoretical frameworks, subfields typical linguist job activities linguistic research/development venues

Download ppt "DLLS 20031 Ontologically-based Searching for Jobs in Linguistics Deryle Lonsdale Funded by:"

Similar presentations

Ads by Google