Master Thesis Projects  

The Section for Digital Signal Processing offer projects within neural networks, adaptive signal processing, multimedia signal processing, biomedical signal processsing.

Teachers: Lars Kai Hansen (LKH) Jan Larsen (JL).

Relevant Courses: 04361 - 04363 - 04364 - 04365 - 04461 - 04462.

Project Titles:


Intelligent Mobile Phones

Project Supervisor

Jan Larsen

Project Partner

Simulation and Algorithm (SIMAL), DSP/NMP, Nokia.
Project 6 is in collaboration with Dept. of Acoustic Technology, DTU.

Background

The area of intelligent mobile phones is currently in rapid progress. Mobile phones is integrated with ideas from handled and wearable computers to reach the ultimate goal: a wireless personal digital assistant. Whereas the transmitter/receiver unit was the most important in the first generations of mobile phones the diversity of future mobile phones will most likely be available multimedia applications. There is consequently an increasingly need for development of applications like speech recognition, i ntelligent web browsing and signal/image processing. Further practical considerations involve robustness to environment noise, user friendliness and easy customization.

Topics for M.Sc. project

1. Decimation For Reducing Complexity of Speech Recognition
In speech recognition, a piece-wise stationary statistical model of speech is constructed from a sequence of so-called feature vectors. The feature vectors are the result of a speech preprocessor, which typically computes a set of coefficients every 1 0ms based on a fixed window of 20-30ms of a speech signal. Typically the feature vectors are composed of LPC or so-called cepstral coefficients. In recent studies, it has been observed that under certain conditions the stream of feature vectors from the p reprocessor can be down-sampled with a factor of 2-5 without loss in recognition performance. These studies have primarily focused on so-called whole-word based speech recognition, where a separate model is created for each word in the recognition vocabul ary. Unfortunately, whole-word based speech recognition has the limitation that only fairly small vocabularies are feasible in practice as a separate model is required for each word. Therefore, more and more systems are today based on so-called sub-word m odels (phonemes) from which any word of a given language can be constructed.

In this project, decimation (down-sampling) of the preprocessor output for sub-word based speech recognition is investigated. Both fixed decimation, where the down-sampling factor is fixed, and variable decimation, where the down-sampling factor is set ac cording to the difference between consecutive feature vectors, can be investigated. For evaluating the decimation techniques a sub-word based speech recognition engine will be provided.

2. Improved Preprocessing for Speech Recognition
In speech recognition, a piece-wise stationary statistical model of speech is constructed from a sequence of so-called feature vectors. The feature vectors are the result of a speech preprocessor, which typically computes a set of coefficients every 10ms based on a fixed window of 20-30ms of a speech signal. Typically the feature vectors are composed of LPC or so-called cepstral coefficients. In recent studies, it has been observed that under certain conditions the stream of feature vectors from the prepr ocessor can be down-sampled with a factor of 2-5 without loss in recognition performance. These studies have primarily focused on so-called whole-word based speech recognition, where a separate model is created for each word in the recognition vocabulary. Unfortunately, whole-word based speech recognition has the limitation that only fairly small vocabularies are feasible in practice as a separate model is required for each word. Therefore, more and more systems are today based on so-called sub-word model s (phonemes) from which any word of a given language can be constructed.

In this project, decimation (down-sampling) of the preprocessor output for sub-word based speech recognition is investigated. Both fixed decimation, where the down-sampling factor is fixed, and variable decimation, where the down-sampling factor is set ac cording to the difference between consecutive feature vectors, can be investigated. For evaluating the decimation techniques a sub-word based speech recognition engine will be provided.

3. Out-Of-Vocabulary Word Rejection for Speech Recognition
This project aims at investigating utterance rejection algorithms for small vocabulary isolated word recognition like name dialing and command word recognition. In name dialing applications for portable devices like mobile phones, the ability of the speec h recognizer to reject utterances is very important. If rejection is not used, an erroneous recognition may result and consequently the phone will place a call to a wrong number. This situation is likely to happen especially in noisy conditions or if the user utters a name which is not part of the recognizer vocabulary (out-of-vocabulary word rejection). Most utterance rejection algorithms are based on so-called log likelihood ratios, that is, the ratio between the log-probability of the "winning" model compared to the second best model or a "filler" or "garbage" model. When the ratio is above some thresh old the recognition has high confidence, whereas a low ratio implies low confidence and consequently the utterance should be rejected. Unfortunately, the threshold that gives a good trade-off between rejection and recognition is very sensitive to the sign al to noise ratio. In some applications the signal to noise ratio can be estimated only roughly based on a single utterance. It is therefore desirable to develop a rejection measure, which is less sensitive to SNR. One possibility in this direction is to use a posterior probability based measure, or to set the rejection threshold proportional to the log likelihood ratio between speech and non-speech segments of the waveform.

The project starts by a literature study on current methods followed by an evaluation and possibly improvement of a few selected approaches. The selected approaches must be suitable primarily for so-called sub-word (phoneme) based Isolated Word Recognitio n. In the project, various phoneme based speech recognition engines will be available for evaluating the developed rejection algorithms

4. Rate of Speech and Phoneme Count Estimation for Speech Recognition
This project investigates and compares methods for estimating the speaking rate of a speaker also known as the rate of speech (ROS). The ROS is typically measured in terms of the number of sound units (phonemes) per time unit. A reliable estimate of ROS c an be used to improve the performance of a speech recognizer significantly, as it is well known that performance is very poor for very fast or very slow speaking persons. The poor performance for these speakers can be improved by appropriately taking care of the ROS, e.g., by phoneme duration modelling according to the estimated ROS for each phoneme, or by using separate models for "outlier" ROS speakers. The ROS estimator can also be used for estimating the number of phonemes in an utterance. The phoneme count estimate can be used for constraining the recognition task to words of a particular length, so as to improve performance and reduce recognition complexity.

In this project various methods for ROS and phoneme count estimation based on the speech recognizer itself are compared to neural network based approaches. Both standard feed-forward and feed-back networks can be evaluated.

5. Text-to-phoneme mapping with Hidden Markov Models
Speaker-independent speech recognition systems often employ statistical phoneme models to map a specific language. In order to recognize a pre-specified vocabulary it is required to translate the "spelled" word strings into string of phonemes. In lang uages like Japanese or Finnish this is easy since the pronunciation is uniquely determined from the spelling. In other languages the translation is not known in advance and a dictionary of phonetic transcription is needed. However, due to large sizes, suc h dictionaries are not suitable for handheld devices like mobile phones. A widely used approach for statistical Text-To-Phoneme (TTP) models are the so-called decision trees. However, if the vocabulary in the application is unconstrained, the decision trees will typically be very large in order to provide an acceptable mapping accuracy.

In this project, alternative methods for text-to-phoneme mapping are compared to the decision tree based approach. The main objective of the developed model is that it should be significantly smaller than the decision tree model without compromising the m apping accuracy. Potential frameworks to consider are so-called Hidden Markov Models (HMM), feed-forward or recurrent neural networks.

6. Pre-processing for Speech Recognition Using Advanced Auditory Models
A normal automatic speech recognition system includes a pre-processing module, which extracts features from the audio waveform. Mel Frequency Cepstral Coefficient (MFCC) features have more or less been adopted by the speech processing society as standard. MFCCs models the basilar membrane by a mel scaled frequency axis, and turns the convolution with the vocal tract into a sum by using the cepstrum instead of the spectrum. MFCC provides a very good front end when the speech is relatively clean, whereas in noisy environments performance of the recognizer may be degraded quite severely.

In order to improve noise robustness of the recognizer, this project will focus on applying advanced auditory models for feature extraction like e.g. the PEMO model developed by the Medical Physics Group at Oldenburg University (http://medi.uni-oldenburg. de /members/juergen/asr.html) or the Auditory Image Model from Medical Research Council, Cambridge (ftp://ftp.essex.ac.uk/pub/omard/dsam/).

Desired prerequisites

Basic knowledge of signal processing, e.g., acquired through course 04361/04362 Digital Signal Processing, 04364 Non-linear Signal Processing or project and individual courses. It will be possible to improve relevant signal processing skills through individual courses. Further, in particular for project 6, 51231 Acoustic Communication is desirable.

Work plan

The listed topics cover a large area and can be theoretical as well as practical oriented according to your wishes. The project starts by a literature study and problem formulation. Development of algorithms is carried out in MATLAB and/or C, and subseque ntly the algorithms are tested and compared often using relevant real world data sets. Parts of the work can be carried out at Nokia's location in Sydhavnen.

Tilbage til projekttitler


Segmentation of magnetic resonance images (MRI) using finite element methods

Project Supervisor

Lars Kai Hansen

Background

Deformable models (such as ``snakes'') is one approach to the problem of finding curves (2D surfaces) in 2D images. These methods has the advantage of solving two problems simultaneously: (1) Finding the pixels that belongs to the surface. (2) Representing the surface for visualization, interpretations or other purposes. In this project we will use the Finite Element method to solve the resulting equations, thereby obtaining good algorithmic complexity numerical stability.

Objevtive

To formulate and implement a Finite Element based method for surface-extraction in 2D MRI images using energy-minimizing curves. To be less sensitive to initializations a ``pressure'' or ``weight'' force will be used. These forces will simulate an inflating baloon or gravity.

Work Plan

The project starts with a study of the Finite Element method and deformable models for segmentation. A deformable models algorithm is then formulated and implemented on a suitable computer platform. The method is finally tuned and avaluated on MRI images from Rigshospitalet.

Desired Prerequisites

Digital Signal Processing (04361).

Tilbage til projekttitler


Bayes reconstruction in PET imaging

Project Supervisor

Lars Kai Hansen.

Background

Rekonstruktion af PET billeder i 2D eller 3D har en begrænsning p.g.a. en del støj i de optagne sinogrammer, hvorfra rekonstruktion foretages. Signal/støj-forholdet i sinogrammet er tit dårligt, hvilket kan ses i de rekonstruerede billeder/volumener.

Objective

At introducere a priori viden i rekonstruktionsprocessen. En eller flere Bayes baserede rekonstruktions algoritmer skal afprøves i 2D og gerne i 3D med henblik på fusion af funktionelle data (PET of fMRI) og anatomiske data (MRI).

Work Plan

Efter et litteratur studium i rekonstruktionsmetoder og Bayes teori formuleres en Bayes baseret rekonstruktions algoritme. Algoritmen implementeres og anvendes til at analysere af PET og fMRI data fra Rigshospitalet og Hvidovre hospital.

Desired Prerequisites

Digital Signal Processing (04361), Avanceret Digital Signal Processing (04365), Individual course at the Section for Digital Signal Processing, IMM.

Tilbage til projekttitler



Signal Processing in Hearing Aids

Project Supervisor

Jan Larsen, Lars Kai Hansen

Project Partners

Oticon A/S

Background

The digitalization of hearing aids enbales construction of advanced signal processing algorithms

Desription

Starting from relevant literature and previous thesis work the porjects will investigate new algorithms. Further, it will be possible to test some of the algorithms on special hardware platforms.

Detailed project info

Desired Prerequisites

Digitale signal processing (04361) and/or individual course at the Section for Digital Signal Processing, IMM.

Work Plan

Litteratur review and problem formulation, development of general algorithm mainly in MATLAB and/or C, test and analysis and possible implementation on special processors platforms.

Tilbage til projekttitler


Neural Networks: design, optimization, evaluation and visualization

Project Supervisros

Lars Kai Hansen and Jan Larsen.

Background

Når man bygger adaptive modeller af statistiske systemer må man altid nøje overveje valget af modelkompleksitet. En for lille model vil lave systematiske fejl, mens en for stor model vil være usikkert bestemt, og derfor typisk fejlbehandle data som ikke var del af den adaptive proces. Design af neurale netværk til specifikke anvendelser, f.eks. medicinsk informatik, brain mapping, eller motorovervågning, implementerer et optimalt modelvalg gennem netværksbeskæring og test-fejl estimering. Visualisering og internet-programmering (HTML, VRML, JAVA) er vigtigt for kommunikation af resultater til brugerne.

Description

Litteraturstudiet bygger på artikler og netsurfing til de vigtigste neural netværksgrupper. En "Designer netværk" simulator implementeres f.eks. i Matlab eller C. Algoritmerne anvendes til et aktuelt problem fra en ekstern samarbejdspartner (f.eks. Rigshospitalet). Resultaterne visualiseres og offentliggøres via Internet.

Workplan

Emnet er meget bredt og man kan fokusere på såvel matematiske-statistiske, programmerings- eller anvendelsesmæssige aspekter af neurale netværk.

Desired Prerequisites

Digital Signal Processing (04361) Nonlinear Signal Processing (04364), Avanced Digital Signal Processing (04365), individual courses.

Tilbage til projekttitler

 

Last modified January 30, 2000
Write to the DSP, IMM webmaster at www@eivind.imm.dtu.dk
© Copyright 2000 by Section for DSP, IMM.