Download presentation
Presentation is loading. Please wait.
Published byGeorgia Barrett Modified over 8 years ago
1
Windows CE 에서 ARM 프로세서의 동작 김대홍 소프트웨어 팀장 ㈜씨랩시스
2
Agenda ARM 소개 ARM 버전 별 특징 Windows CE 에서의 ARM Windows CE 5.0 에서의 ARM 디렉토리구조 Windows CE 에서의 특정 ARM 명령 지원
3
Who is the ARM Founded in November 1990 spun out of Acorn Computers Design the ARM RISC processor cores Licenses ARM core designs to semiconductor partners who fabricate and sell to their customers. ARM does not fabricates silicon itself Develop technologies to assist with the design-in of the ARM architecture Software tools, boards, debug hardware, application software, bus architectures, peripherals etc
4
The ARM® Connected Community
5
Segment Converging on Consumer
6
Licensing Status
7
ARM Instruction Instruction Code Width 32bit : ARM 16bit : Thumb 8bit : Java DSP Extension Instruction Thumb-2 (ARM1156) TrustZone (ARM1176) Instruction Type Data Processing Instruction Data Transfer Instruction Control Flow
8
32 bit ARM 명령어들을 16bit 의 명령어로 압축한 명령어군을 THUMB 이라고 함 THUMB 명령어로 이루어진 프로그램은 ARM 명령어로 이루어진 프로그램에 비해 70% 의 크기로 줄어듦 THUMB 은 16bit 메모리에서 ARM 보다 130% 의 속도증가를 보임 THUMB 시스템은 ARM 시스템에 비해 전력소모가 적음 THUMB 시스템은 ARM 시스템에 비해 시스템 비용이 적게 소요됨 Thumb
9
Jazelle Jazelle-enabled ARM cores execute 8-bit Java bytecode 95% of bytecodes executed in hardware(typical) Normal JVM : 1.0 Caffeinemarks/MHz ARM9EJ : 5.5 Caffeinemarks/MHz Significantly more power-efficient < 12K extra gates ( ARM9EJ-S vs ARM9E-S )
10
Agenda ARM 소개 ARM 버전 별 특징 Windows CE 에서의 ARM Windows CE 5.0 에서의 ARM 디렉터리구조 Windows CE 에서의 특정 ARM 명령 지원
11
ARM 제품군별 특징 ARM7ARM9ARM10ARM11 파이프라인 수 3 단계 5 단계 6 단계 8 단계 속도 80150260335 mW/MHz0.06mW/MHz0.19mW/MHz( + 캐시 ) 0.5mW/MHz(+ 캐시 ) 0.4mW/MHz(+ 캐시 ) MIPS/MHz0.971.11.31.2 Architecture 폰노이만하버드 곱셈기 8x32 16x32
12
ARM 명령법 ARM{x}{y}{z}{T}{D}{M}{I}{E}{J}{F}{-S} x : 제품군 y : MMU/MPU z : 캐시 T : Thumb 16 비트 디코더 D : JTAG 디버그 M : 고속 곱셈기 I : Embedded ICE macrocell E : DSP 확장 명령어 J : Jazelle F : VFP 장치 S : synthesizible 버전
13
ARM 버전별 특징 버전코어 예향상된 ISA* ARMv4StrongARM Singned/Unsigned 하프워드 / 바이트 로드 - 스토어 명령어 System 추가 ARMv4TARM7TDMI/ARM9TDMI ARM720T/ARM920T Thumb ARMv5TEARM9E-S ARM966E-S ARM1020E Xscale 개선된 ARM/Thumb Interworking 향상된 곱셈 명령어 DSP 명령어 보다 빠른 곱셈 누산기 ARMv5TEJ ARM7EJ 와 ARM926EJJAVA 가속기 ARMv6ARM1136EJ-SSIMD Unaligned data support Multi-processing *ISA (Instruction Set Architecture)
14
ARM 캐시 코어 정책 코어쓰기정책교체정책할당정책 ARM720T 연속기입방식랜덤방식읽기미스 ARM740T 연속기입방식랜덤방식읽기미스 ARM920T 연속기입방식, 후기입방식랜덤방식, 라운드 로빈방식 읽기미스 ARM926EJS 연속기입방식, 후기입방식랜덤방식읽기미스 ARM940E 연속기입방식, 후기입방식랜덤방식, 라운드 로빈방식 읽기미스 ARM1020E 연속기입방식, 후기입방식랜덤방식, 라운드 로빈방식 읽기미스 ARM1026EJS 연속기입방식, 후기입방식랜덤방식, 라운드 로빈방식 읽기미스 인텔 StrongARM 연속기입방식, 후기입방식랜덤방식읽기미스 인텔 XScale 연속기입방식, 후기입방식랜덤방식읽기미스, 쓰기미스
15
ARM9TDMI Caches Macrocells Due to complexity of attaching it to a memory system, the ARM9TDMI is not licensed as a stand-alone core It is available in a range of cached macrocells ARM922T 2x8k caches Memory Management Unit (MMU) Write Buffer ARM920T As ARM922T but 2 x 16K caches ARM940T 2 x 4K caches Memory Protections Unit (MPU)
16
ARM9E Processor Core ARM9E is based on the ARM9TDMI core Architecture V5TE support Improved ARM/Thumb interworking New 32x16 and 16x16 multiply instructions New Count leading zeros instruction New Saturated maths instructions Core implementation differences Single cycle 32x16 multiplier implementation EmbeddedICE Logic RT
17
ARM926EJ-S Overview Jazelle state allows direct execution of Java bytecodes ARM926EJ-S ARM9EJ-S core Configurable Instruction and Data caches Instruction and Data TCM interfaces Memory Management Unit 2 x 32-bit AHB bus interfaces – Instructions and Data
18
ARM1136 Family Overview ARM1136JF-S Synthesizable ARM V6 Architecture High Performance Core 8-stage pipeline Static and Dynamic branch prediction Return stack Low latency interrupt mode Physically-lagged 4-64K I & D Caches Internal Configurable TCMS Four main memory ports Jazelle technology Integrated VFP coprocessor ARM1136J-S As above but with no VFP
19
ARM Architecture v5TE Architecture v5TE contains full v4T ARM and Thumb instructions sets plus: Improved support for interworking Covered in ARM/Thumb Interworking module Breakpoint instruction (ARM and Thumb) Count Leading Zeros instruction Extended coprocessor instructions – MCR2 etc. Support for saturated mathmatics Packed half-word signed multiplication instructions Doubles-word coprocessor transfer instructions – MCRR/MRRC
20
Intel StrongARM Overview ARM V4 Architecture (no Thumb support) 5-stage pipeline, reduced branch penalty Improved multiplier (typically 2cycles faster than ARM9TDMI) No support for Multi-ICE debugging (JTAG limited to connectivity test) No external coprocessor bus SA-110: 16K I&D caches, 8 x 16 byte write buffer SA-1100/1110: On-Chip peripherals, memory controller Smaller cache sizes PID register Instruction breakpointing via CP15
21
Intel Xscale Overview Architecture V5TE compatibility 7-8 stage pipeline with statistical branch prediction 32k Data and Instruction Caches, plus 2k data Minicache 8-entry write buffer, 4-entry Fill and Pend buffers Full 32-bit coprocessor interface Debug and performance monitoring logic(via CP14) Multiply-Accumulate block(as CP0) Configurable core clock speed – 100-733MHz from 33-66MHz iput clock Async input bus clock to 100MHz(max 1/3 of bus core clock) e.g. Intel 80200 Processor: Interrupt controller (implemented as CP13) ECC Memory Protection
22
Agenda ARM 소개 ARM 버전 별 특징 Windows CE 에서의 ARM Windows CE 5.0 에서의 ARM 디렉터리구조 Windows CE 에서의 특정 ARM 명령 지원
23
The microprocessor families supported in Windows CE ARM architectures v4, v4T, Thumb, v5TE, and Intel XScale X86 SHx Renesas SuperH SH4 microprocessors MIPS NEC Toshiba Philips Semiconductor Integrated Device Technologies LSI Logic Quantum Effect Design
24
CE 버전에 따른 ARM 지원변화 Windows CE 5.0 ARMV4 kernel 이 ARMV4I kernel 로 합쳐졌음. Windows CE.net 4.2 ARMV4T (Thumb) kernel 이 ARMV4I kernel 로 합쳐졌음. 하지만, ARMV4I kernel 에서 16-bit Thumb 응용프로그램은 계속 지원
25
ARM CPU 사용시 주의사항 (I) ARM kernel 은 registers 사용에 제약을 두지 않음 물리적 주소를 정적 맵핑을 통해 (direct-mapped) 1M 단위로 가상 주소로 맵핑 할 수 있음 OEMAddressTable – 정적 테이블 커널은 이 테이블을 통해 두개의 영역을 만듬 0x80000000 ~ 0x9FFFFFFF: cache and buffering 활성 0xA0000000 ~ 0xBFFFFFFF: cache and buffering 비활성 OEMInterruptHandlerFIQ 함수는 사용되지 않을지라도 성공적인 빌드를 위해 반드시 OAL 에 속해 있어야 함
26
ARM CPU 사용시 주의사항 (II) Nested interrupts ARM CPU 는 두 개의 인터럽트 지원 IRQ – Interrupt Request FIQ – Fast Interrupt Request GetSystemInfo 를 사용하여 프로세서의 정보를 얻을 때, SYSTEM_INFO 구조체의 dwProcessorType 변수는 정확한 값을 리턴 하지 않음 CEProcessorType 전역 변수를 시스템에 따라서 PROCESSOR_ARM720 나 PROCESSOR_STRONGARM 로 설정 OEMInit 함수에서 IOCTL_PROCESSOR_INFORMATION 을 구현하고 CEProcessorType 에 값을 설정
27
CPU Initialization ARM Kernels CPU 를 Supervisor mode 로 전환 IRQ & FIQ 를 비활성화 MMU, I-cache and D-cache 를 비활성화 I-cache, D-cache and TLB 를 flush 하거나 invalidate 과정을 거치고 the write buffer 를 비움 설정 GPIO Memory controller Interrupt controller RTC OEMAddressTable 의 physical 주소를 얻어서 R0 에 저장 KernelStart 로 Jump
28
CPU Initialization XScale 성능개선 Xscale 에서는 Branch Target Buffer Enabel bit 를 변경함으로써 성능을 개선시킬 수 있음 OEMInit 함수에서 변경
29
CPU Dependencies for OAL Functions (I) MIPSII MIPSII_FP MIPSIV MIPSIV_F P ARMV4ARMV4I AMRV4T SH4x86 CacheErrorHandlerXX InitClockXXXXX OEMARMCacheModeXX OEMCacheRangeFlushXXXX OEMClearDebugCommErrorX OEMDataAbortHandlerX OEMFlushCacheX OEMGetExtensionDRAMXXXXXX OEMGetRealTimeXXXXXX OEMIdleXXXXXX OEMInitXXXXXX OEMInitDebugSerialXXXXXX OEMInterruptDisableXXXXXX OEMInterruptDoneXXXXXX
30
CPU Dependencies for OAL Functions (II) MIPSII MIPSII_FP MIPSIV MIPSIV_F P ARMV4ARMV4I AMRV4T SH4x86 OEMInterruptEnableXXXXXX OEMInterruptHandlerXX OEMInterruptHandlerFIQXX OEMIoControlXXXXXX OEMNMIX OEMNMIHandlerX OEMPowerOffXXXXXX OEMReadDebugByteXXXXXX OEMSetAlarmTimeXXXXXX OEMSetRealTimeXXXXXX OEMWriteDebugByteXXXXXX OEMWriteDebugStringXXXXXX SC_GetTickCountXXXXXX
31
Agenda ARM 소개 ARM 버전 별 특징 Windows CE 에서의 ARM Windows CE 5.0 에서의 ARM 디렉터리구조 Windows CE 에서의 특정 ARM 명령 지원
32
The production-quality OEM adaptation layer (OAL) OAL 계발과정을 단축시킴 OAL 를 개선된 컴포넌트 형태로 제공 프로세서 family 에 대해 일관된 형태의 사용편리성 제공
33
디렉터리 %_WINCEROOT%\Platform\ the production-quality OAL model Memory-mapped configuration files Some include files that define the memory layout for the hardware platform that matches Config.bib Some glue logic, which is board-level customization code that unites everything in the %_WINCEROOT%\Platform\Common directory %_WINCEROOT%\Platform\Common contains the CPU-specific OAL routines %_WINCEROOT%\Public\Common\Oak\CSP The chip support package (CSP) directory contains a collection of system-on-a-chip (SOC) and CPU or chipset-level peripheral drivers. You can port the CSP driver for a core peripheral to any new hardware platform environment that makes use of the SOC or chipset without modification.
34
\Platform\ the production-quality OAL BSP Samsung SMDK2410 %_WINCEROOT%\Platform\SMDK2410 Intel Mainstone II %_WINCEROOT%\Platform\MainstoneII
35
\Platform\Common (I) \Platform\Common subdirectoryDescription Src\ARMContains the ARM processor-specific OAL code. Src\ARM\ARM920TContains all the CPU OAL code required for the ARM920T processor. Src\ARM\ARM920T\AbortContains the abort routines specific to the ARM920T CPU. Src\ARM\ARM920T\CacheContains the cache routines specific to the ARM920T CPU. Src\ARM\ARM926Contains the CPU OAL code required for the ARM926 processor. Src\ARM\ARM926\CacheContains the cache routines specific to the ARM926 CPU. Src\ARM\CommonContains routines that are generic to ARM-based hardware platforms. Src\ARM\Common\CacheContains the cache routines that are common for all ARM CPUs. Src\ARM\Common\MemoryContains the memory translation routines that are common for all ARM CPUs. The memory routines are used for translating physical addresses to physical addresses, and vice versa.
36
\Platform\Common (II) \Platform\Common subdirectoryDescription Src\ARM\IncContains include files that are generic to all ARM CPUs. Src\ARM\IntelContains the OAL code specific to the Intel hardware platform. Src\ARM\Intel\PXA250Contains routines specific to the PXA250 processor. Src\ARM\Intel\PXA27xContains routines specific to the PXA27x processor. Src\ARM\Intel\SA1100Contains routines specific to the SA1100 processor. Src\ARM\SamsungContains the OAL code specific to the Samsung hardware platform. Src\ARM\Samsung\S3C2410xContains routines specific to the S3C2410x processor.
37
\Public\Common\Oak\CSP the production-quality OAL CSP\ \ \. Ex) the serial function CSP driver \Public\Common\Oak\CSP\ARM\SAMSUNG\S 3C2410X\SERIAL
38
Agenda ARM 소개 ARM 버전 별 특징 Windows CE 에서의 ARM Windows CE 5.0 에서의 ARM 디렉터리구조 Windows CE 에서의 특정 ARM 명령 지원
39
ARM10 Intrinsic Functions CLZ Counts leading zeroes before first 1-bit The common intrinsic _CountLeadingZeros accessed the CLZ instruction BKPT Create soft breakpoint The common intrinsic __trap accessed the BKPRT instruction _swi Generates a call to the OS using the SWI software interrupt instruction. __emit Insert a specified instruction into the instuction stream _MoveFromCoprocessor,_MoveFromCoprocessor2 Read data from the ARM coprocessor _MoveToCoprocessor,_MoveToCoprocessor2 Write Data to te ARM coprocessor
40
ARM DSP-enhanced Intrinsic Functions (I) FunctionsCorresponding ARM DSP instruction Description _SmulAddLo_SW_SL _SmulAddHi_SW_SL _SmulAddHiLo_SW_SL _SmulAddLoHi_SW_SL SMLAxyA signed-integer multiply and accumulate operation: 16x16-bit multiply followed by a 32-bit add. _SmulAddWLo_SW_SL _SmulAddWHi_SW_SL SMLAWyA 32x16-bit multiply operation, followed by a 32- bit add of the upper 32 bits of the 48 bit product. _SmulAddHi_SW_SQ _SmulAddLo_SW_SQ _SmulAddHiLo_SW_SQ _SmulAddLoHi_SW_SQ SMLALxyA 16x16-bit multiply operation, followed by a 64- bit add of the product, with a 64-bit integer. _SmulLo_SW_SL _SmulHi_SW_SL _SmulHiLo_SW_SL _SmulLoHi_SW_SL SMULxyA signed-integer 16x16-bit multiply operation. _SmulWLo_SW_SL _SmulWHi_SW_SL SMULWyA signed-integer 32x16-bit multiply operation, returning the upper 32-bits.
41
ARM DSP-enhanced Intrinsic Functions (II) FunctionsCorresponding ARM DSP instruction Description _AddSatIntQADDA saturating add instruction. _SubSatIntQSUBA saturating subtract instruction. _DAddSatIntQDADDAn instruction to double an integer and saturate, and then add to a second integer and saturate. _DSubSatIntQDSUBAn instruction to double an integer and saturate, and then subtract from a second integer and saturate. _ReadCoProcessorMRRCAn operation to transfer values from a coprocessor to two ARM registers. _WriteCoProcessorMCRRAn operation to transfer two ARM register values to a coprocessor.
42
ARM XSCALE Intrinsic Functions FunctionsARM Xscale Instruction Description _SmulAdd_SL_ACCMIAMultiplies the signed value in register Rs by the signed value in register Rm, and then adds the result to the 40-bit accumulator. _SmulAddPack_2SW_ACCMIAPHPerforms two 16x16 signed multiplications on packed half-word data and accumulates these to a single 40-bit accumulator. _SmulAddLo_SW_ACC _SmulAddHi_SW_ACC _SmulAddLoHi_SW_ACC _SmulAddHiLo_SW_ACC MIAxyPerforms one 16-bit signed multiplication and accumulates the result to a single 40-bit accumulator. _WriteCoProcessorMARMoves 64 bits of data from ARM registers to coprocessor registers. _ReadCoProcessorMRAMoves 64 bits of data to ARM registers from coprocessor registers. _PreLoadPLDThis instruction is used as a hint to the memory system that a memory access from the specified address will occur shortly.
43
참고자료 ARM System Developer’s Guide ( 한국어판 ) “ARM Technical Training Course” manual http://msdn.microsoft.com/embedded/
44
© 2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.