Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and.

Similar presentations


Presentation on theme: "Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and."— Presentation transcript:

1 Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and Computer Engineering Shiraz University General-purpose Programming of Massively Parallel Graphics Processors 1

2 Machine Translation 2  Suppose we are asked to translate a foreign sentence f into an English sentence e: f : f 1 … f m e : e 1 … e l  What should we do ?  For each word in foreign sentence f, we find its most proper word in English.  Based on our knowledge in English language, we change the order of generated English words.  We might also need to change the words themselves. f 1 f 2 f 3 … f m e 1 e 2 e 3 … e m e 1 e 3 e 2 e m+1 …e l

3 Example 3 امروز صبح به مدرسه رفتم went school to morning today Finding its most proper word in English Reordering and Changing the words todaymorningwenttoschool thismorningwenttoschoolI Translation Model Language Model Translation

4 Statistical Translation Models 4 امروز صبح به مدرسه رفتم went school to morning today Finding its most proper word in English Translation Model t( go| رفتم ) > t(x| رفتم ) x as all other English words  The machine must know t(e|f) for all possible e and f to find the max.  Machine should be trained:  IBM Model 1-5  Calculate t(f|e).

5 IBM Models 1 (Brown et.al [1993]) 5 Model 1 Corpus (Large Body Of Text) t(f|e) for all e and f which are in the Corpus

6 IBM Models 1 (Brown et.al [1993]) 6 Choose initialize value for t(f|e) for all f and e, then repeat the following steps until Convergence:

7 IBM Models 1 (Brown et.al [1993]) 7 -- -- -- t(f|e): -- -- -- fjfj eiei The problem is to find t(f|e) for all e and f How probable it is that f j be the translation of e i

8 IBM Models 1 (Brown et.al [1993]) 8 -- -- -- t(f|e): -- -- -- c(f|e): -- -- -- fjfj eiei - - - - - Total(e): eiei ∑ of each Row C(f|e) Initialize Initialize to Zero

9 IBM Models 1 (Brown et.al [1993]) 9 In each sentence pair, for each f in foreign sentence, we calculate ∑ t(f|e) for all e in the English sentence, called total s. Suppose we are given : : Total s [2]= t(f|e)[1,2]+t(f|e)[2,2]+t(f|e)[3,2]+t(f|e)[4,2] C(f|e)[1,2]+=t(f|e)[1,2]/total s [2] Total_e[1]+= t(f|e)[1,2]/total s [2]

10 IBM Models 1 (Brown et.al [1993]) 10 After processing all sentence pairs in the corpus, update the value of t(f|e) for all e and f: t(f|e)[i,j] = C(f|e)[i,j]/total(e)[i] Start processing the sentence pairs, Calculating C(f|e) and total(e) using t(f|e) Continue the process until value t(f|e) has converged to a desired value.

11 IBM Model 1 (Psudou Code) 11 initialize t(f|e) do until converge c(f|e)=0 for all e and f, total(e)=0 for all e, for all sentence pair do total(s,f)=0 for all f, for all f in f (s) do for e in all e (s) do total(s,f)+=t(f|e) for all e in e (s) do{ for all f in f (s) do c(f|e)+=t(f|e)/total(s,f) total(e)+=t(f|e)/total(s,f) for all e do for all f do t(f|e)=c(f|e)/total(e) Initialization Calculating Total s for each f In f (s) Calculating C(f|e) and total(e) Initialize to zero Updating t(f|e) using C(f|e) and total(e)

12 Parallelizing IBM Model 1 12 initialize t(f|e) do until converge c(f|e)=0 for all e and f total(f)=0 for all f for all sentence pair do total(s,f)=0 for all f, for all e in e (s) do for f in all f (s) do{ total(s,f)+=t(f|e) for all e in e (s) do{ for all f in f (s) do c(f|e)+=t(f|e)/total(s,f) total(f)+=t(f|e)/total(s,f) for all e do for all f do t(f|e)=c(f|e)/total(f) For each f,e it is independent of others Updating the value of each t(f|e) for all t and f is independent of each other The process on each sentence pair is independent of others For each f,e it is independent of others

13 Initialize t(f|e) 13 __global__ void initialize(float* device_t_f_e){ int pos=blockIdx.x*blockDim.x+threadIdx.x; device_t_f_e[pos]=(1.0/NUM_F); } Underflow is possible __global__ void initialize(float* device_t_f_e){ int pos=blockIdx.x*blockDim.x+threadIdx.x; device_t_f_e[pos]=(100000/NUM_F); } Each thread initialize one entry of t(f|e) to a specified value:

14 Process Of Each Sentence Pair 14 for all sentence pair do total(s,f)=0 for all f, for all e in e (s) do for f in all f (s) do{ total(s,f)+=t(f|e) for all e in e (s) do{ for all f in f (s) do c(f|e)+=t(f|e)/total(s,f) total(f)+=t(f|e)/total(s,f) Using shared memory No use of Reduction. Why? Use atomicAdd(), as it’s possible that two or more threads add a value to c(f|e) or total(f) simultaneously. It is data dependent. Each Thread Process one Sentence Pair

15 Updating t(f|e) 15 __global__ void update (float* device_t_f_e, float* device_count_f_e, float* device_total_f, int block_size, int Col) { int pos=blockIdx.x*block_size+threadIdx.x; float total=device_total_f[pos/Col]; float count=device_count_f_e[pos]; device_t_f_e[pos]=(100000*count/total); device_count_f_e[pos]=0; } Each thread update one entry of t(f|e) to a specified value And Set one entry of c(f|e) to zero for next iteration Here, it is not possible to set total(f) to Zero, As there is no synchronization between threads out of a block

16 Setting total(f) to Zero 16 __global__ void total(float* device_total_f){ int pos=threadIdx.x+blockDim.x*blockIdx.x; device_total_f[pos]=0; } Each thread set one entry of total(f) to Zero:

17 Results 17 NUM_FNUM_E#SENTPAIRCPU-TimeGPU-TimeSpeed-Up 2048 5120.4520490.0616397.33 4096 10241.7362510.15787810.99 4096 20481.8576860.15796111.76

18 Future Goals 18  Convergence Condition:  We repeat the iterations of calculating C(f|e) and t(f|e) for 5 times.  But it should be driven from the value of t(f|e).  We wish to add it to our code as it has a capability of parallelization.  It’s just one of IBM Model 1-5, which are implemented as GIZA++ package.  We wish to parallelize 4 other models.

19 We Want to Express Our Appreciation to: 19 For her useful comments and valuable notifications. For his kindness and full support.

20 20


Download ppt "Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and."

Similar presentations


Ads by Google