Download presentation
Presentation is loading. Please wait.
2
Speech tools Jean-Philippe Goldman 03.03.2004
3
2 Two questions What kind of data ? Which task ?
4
3 What kind of data ? Speech content (noise, multivoice,…) Data File Sound/Transcription/PitchCurve Sampling/Quantization 16k 12k 8k 4k 8bit Size 16k16bit,256kbps 1.9Mo/mn 115Mo/h Format Sound: wav, wma, mp3, ogg, aiff, aifc, au, vox, raw, sd, CSL, Ogg/Vorbis, NIST/Sphere Transcription: HTK, TIMIT, TextGrid, Phondat Number of files
5
4 Which task ? Visualization and Edition: Record, Play, edit, mix, add effects Analysis: spectral, pitch Speech manipulation: Filtering, mixing, adding effects, prosodic manipulation Annotation: segmentation, labeling Scripting: Batch, communication with outside Plotting
6
5 Examples of tasks build stimuli for an experiment (i.e. cross- splicing) manage a speech database for a TTS engine create a prosodic database analyze speech corpus from experiment recordings verify/correct an automatic segmentation
7
6 Two questions What kind of data ? Which task ? Two rules there is no unique tool to do everything there are plenty of ways to do one thing
8
7 Tool features Visualization/Edition Analysis Speech manipulation Annotation Scripting Plotting Supported format Platform/installation Evolution/community Accessibility Price
9
8 Softwares Goldwave(audio editor) Esps Xwaves(routines + visual.) Praat(speech analysis) Wavesurfer(speech editor) Transcriber(annotation tool) Matlab(general purpose soft) OGI speech tools(routines + app. dev.) …winpitch, pitchworks, phonedit, cooledit…..
10
9 Goldwave self-defined as “top rated, professional digital audio editor”
11
10 Goldwave pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface cons: nothing for speech (pitch, formant), windows only, no scripting Good for file edition not for speech
12
11
13
12 Esps - Waves Developed by Entropic + AT&T. Now public Comp.speech FAQ says: Esps: comprehensive set of speech analysis/processing tools Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility
14
13
15
14 Esps – waves pros: powerful, designed for big files, cons: UNIX only (free BSD), not standard formats, requires programming skills, development has stopped
16
15 Praat Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam general purpose speech tool : edition, segmentation and labeling, prosodic manipulation
17
16
18
17 Praat pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation cons: limited scripting language, native format of transcription and pitch files
19
18 WaveSurfer Open Source tool for sound visualization and manipulation speech/sound analysis and sound annotation/transcription platform for more advanced/specialized applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications Requires SnackToolKit
20
19
21
20 Transcriber Authors: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis
22
21
23
22 Matlab (Mathworks) Math. environment Signal processing toolbox : filter-design, spectral analysis, waveform generation, linear prediction voicebox (2002) mike.brookes@ic.ac.ukmike.brookes@ic.ac.uk pitch determination algorithm (2002) Xuejing Sun sunxj@northwestern.edusunxj@northwestern.edu colea speech editor (1998) Philip Loizou loizou@utdallas.edu Univ of Texas-Dallas loizou@utdallas.edu
24
23 Matlab (Mathworks) pros: open, powerful, scripting, excellent plotting cons: poor speech community, standards, not designed for big files
25
24 OGI speech tools/CSLU Toolkit development started in 1992 in C on Unix, at Center for Spoken Language Understanding (CSLU) at OGI Includes : An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools. MAN Pages RAD rapid application development points of entry: Package(C), script(tcl), GUI(tk) levels free for research use
26
25
27
26 Edit Anal ManipAnnot Script Plot Format OS Evolut. Comm Price Goldwave win$40 Esps Waves C shUnixfree Praat yes native console sendpraatsrcfree wavesurfer +snack C tcl/tk pythonsrcfree transcriber xmlfree OGI Toolkit free matlab + Sigproc+ packages nativeno BSD stud. $100 $40/tbx Summary = yes but requires some dev.
28
27 Expect to do conversions Sound files goldwave (win) sox (unix) Transcription files scripts to convert text-formatted label files
29
28 Links www.goldwave.com www.speech.kth.se/software/#esps www.praat.org www.speech.kth.se/software/#wavesurfer www.cse.ogi.edu/toolkit www.mathworks.com (Matlab) www.lpl.univ-aix.fr/~sqlab/ (phonedit) www.sciconrd.com/pworks.htm (PitchWorks) www.winpitch.com (WinPitch) www.adobe.com (CoolEdit > Audition)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.