Computer Architecture and System Programming Laboratory TA Session 12 x86-SSE text string processing instructions
X86-SSE Programming – Text Strings (SSE4.2) An implicit-length text string uses a terminating End-Of-String (EOS) character. X86-SSE includes four SIMD text string instructions that are capable of processing text string fragments up to 128 bits in length. Suppose you are given a text string fragment and want to create a mask to indicate the positions of the uppercase characters within the string. For example, each 1 in the mask 1000110000010010b signifies an uppercase character in the corresponding position of the text string "Ab1cDE23f4gHi5J6". The desired character range and text string fragment are loaded into registers XMM1 and XMM2, respectively.
RFLAGS: 0x4831 = 0100100000110001b
RFLAGS: RFLAGS: RFLAGS: the output format bit 6 is set, which means that the mask value is expanded to bytes RFLAGS: multiple character ranges XMM1 contains two range pairs: one for uppercase letters and one for lowercase letters. RFLAGS: text string fragment that includes an embedded EOS (‘\0’) character ZF is set to 1 final mask value excludes matching range characters following EOS
CF flag – Reset if IntRes2 is equal to zero, set otherwise RFLAGS: multiple character ranges XMM1 contains two range pairs: one for uppercase letters and one for lowercase letters. RFLAGS is set in a non-standard manner in order to supply the most relevant information: CF flag – Reset if IntRes2 is equal to zero, set otherwise ZF flag – Set if any byte/word of xmm2/mem128 is null, reset otherwise SF flag – Set if any byte/word of xmm1 is null, reset otherwise OF flag – IntRes2[0] AF flag – Reset PF flag – Reset
AZ2az_mask: times 16 db ('a' - 'A’) result: times 16 db 0 db `\n\0` section .data str: db ‘Ab1cDE23f4gHi5J6’ AZ_mask: db ‘A', ‘Z’ times 14 db 0 imm: equ 01000100b AZ2az_mask: times 16 db ('a' - 'A’) result: times 16 db 0 db `\n\0` extern printf section .text global main main: enter movdqu xmm1, [AZ_mask] movdqu xmm2, [str] pcmpistrm xmm1, xmm2, imm movdqu xmm3, [AZ2az_mask] pand xmm0, xmm3 paddb xmm2, xmm0 movdqu [result], xmm2 mov rdi, result mov rax, 0 call printf leave ret MOVDQU xmm1, xmm2/m128 Move unaligned double quadword from xmm2/m128 to xmm1. PADDB xmm1, xmm2/m128 Add packed byte integers from xmm2/m128 and xmm1. PAND xmm1, xmm2/m128 Bitwise AND of xmm2/m128 and xmm1.
Equal any (imm[3:2] = 00). The result is a bit mask – 1 if the character belongs to a set, 0 if not. pcmpstrim xmm1, xmm2, 01000000b 00 ‘\0’ ‘1’ ‘k’ ‘b’ ‘a’ ‘2’ xmm1 00 ‘\0’ ‘1’ ‘k’ ‘C’ ‘a’ xmm2 00 FF xmm0 Equal each (imm[3:2] = 10). The result is a bit mask – 1 if the corresponding bytes are equal, 0 if not equal. pcmpstrim xmm1, xmm2, 01001000b 00 ‘\0’ ‘1’ ‘k’ ‘b’ ‘a’ xmm1 00 ‘\0’ ‘1’ ‘k’ ‘C’ ‘a’ xmm2 00 FF xmm0
Equal ordered (imm[3:2] = 11). The result is a bit mask – 1 if the substring is found at the corresponding position, 0 otherwise. pcmpstrim xmm1, xmm2, 01001100b 00 ‘e’ ‘W’ xmm1 ‘!’ ‘d’ ‘e’ ‘W’ ‘B’ ‘l’ ‘i’ ‘n’ ‘h’ xmm2 00 FF xmm0
RCX = 16 (invalid index) rcx RFLAGS: RCX IntRes1 calculation – mask according to the given range bit index in IntRes1 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 bit value in IntRes1 Negative- IntRes2 calculation bit index in IntRes1 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 bit value in IntRes1 RCX = index of least significant set bit in IntRes2 RCX = 16 (invalid index) RCX
RCX = 11 (index of ‘\0’ character, or length of string) RFLAGS: rcx IntRes1 calculation – mask according to the given range bit index in IntRes1 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 bit value in IntRes1 Negative- IntRes2 calculation bit index in IntRes1 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 bit value in IntRes1 RCX = index of least significant set bit in IntRes2 RCX = 11 (index of ‘\0’ character, or length of string) RCX
first loop cycle: second loop cycle: section .data RFLAGS: rcx section .data str: db ‘Ab1cDE23f4gHi5J6’ db ‘Ab1cDE23f4g\0’ EOS_mask: db 0x1,0xFF times 14 db 0 imm: equ 00010100b section .text global strlen strlen: enter xor rax xor rcx movdqu xmm1, [EOS_mask] .loop add rax, rcx pcmpistri xmm1, [str+rax], imm jnz .loop leave ret second loop cycle: RFLAGS: rcx