Mohammed Aabed Sameh Awaideh Abdul-Rahman Elshafei.

Slides:



Advertisements
Similar presentations
Standard Grade Notes General Purpose Packages. These are Software packages which allow the user to solve a range of problems.
Advertisements

Data Compression CS 147 Minh Nguyen.
Steganography University of Palestine Eng. Wisam Zaqoot April 2011 ITSS 4201 Internet Insurance and Information Hiding.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
 Use the Left and Right arrow keys or the Page Up and Page Down keys to move between the pages. You can also click on the pages to move forward.  To.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Lecture # 20 Image and Data Compression. Data Compression.
The Binary Numbering Systems
Mohammed Aabed Sameh Awaideh Abdul-Rahman Elshafei.
1 Adnan Gutub Computer Engineering King Fahd University of Petroleum & Minerals Saudi Arabia A Novel Arabic Text Steganography Method Using Letter Points.
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
CSCI 4550/8556 Computer Networks Comer, Chapter 7: Packets, Frames, And Error Detection.
ARABIC TEXT STEGANOGRAPHY USING MULTIPLE DIACRITICS Adnan Gutub Yousef Elarian Sameh Awaideh Aleem Alvi Computer Engineering Department King Fahd University.
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Connecting with Computer Science, 2e
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
S OFTWARE AND M ULTIMEDIA Chapter 6 Created by S. Cox.
Improving Security and Capacity for Arabic Text Steganography Using 'Kashida' Extensions By : Fahd Al-Haidari Adnan Gutub Khalid Al-Kahsah Jamil Hamodi.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Using Multiple Diacritics in Arabic Scripts for Steganography By Yousef Salem Elarian Aleem Khalid Alvi 1.
Connecting with Computer Science 2 Objectives Learn why numbering systems are important to understand Refresh your knowledge of powers of numbers Learn.
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Steganography detection Roland Cmorik, Martin Šumák.
Chapter 2 Source Coding (part 2)
Lesson 15 Getting Started with PowerPoint Essentials
Topics Introduction Hardware and Software How Computers Store Data
Steganography Steganography refers to any methodology used to hide a message (including text, sound, or picture) in a separate file. Most commonly text.
Robert Krenn January 21, 2004 Steganography Implementation & Detection.
Procedures 6.02 Apply procedures to develop multimedia presentations used in business.
Ch 21 Command Syntax Using the DIR Command with Parameters and Wildcards.
Foundations of Computer Science Computing …it is all about Data Representation, Storage, Processing, and Communication of Data 10/4/20151CS 112 – Foundations.
Hill Cipher Developed by the mathematician Lester Hill in The encryption algorithm takes m successive plain text and substitute for them m cipher.
CS 111 – Sept. 10 Quiz Data compression –text –images –sounds Commitment: –Please read rest of chapter 1. –Department picnic next Wednesday.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Communicating Quantitative Information Is a picture worth 1000 words? Digital images. Number bases Standards, Compression Will [your] images last? Homework:
Using Java MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
CISC1100: Binary Numbers Fall 2014, Dr. Zhang 1. Numeral System 2  A way for expressing numbers, using symbols in a consistent manner.  " 11 " can be.
STEGANOGRAPHY AND DIGITAL WATERMARKING KAKATIYA INSTITUTE OF TECHNOLOGY AND SCIENCES,WARANGAL.
Implementation of Least Significant Bit Image Steganography and its Steganalaysis By: Deniz Oran Fourth Quarter.
Work with Tables and Database Records Lesson 3. NAVIGATING AMONG RECORDS Access users who prefer using the keyboard to navigate records can press keys.
Implementation of Steganographic Techniques Danny Friedheim pd. 2.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Implementation of Least Significant Bit Image Steganography and its Steganalaysis By: Deniz Oran.
Skill Area 311 Part B. Lecture Overview Assembly Code Assembler Format of Assembly Code Advantages Assembly Code Disadvantages Assembly Code High-Level.
Implementation of Least Significant Bit Image Steganography and its Steganalaysis By: Deniz Oran Third Quarter.
Keyboarding Mastery. Proofreader’s Marks What are “Proofreader’s Marks”? Proofreader’s Marks are used by writers to indicate changes they think should.
CS 101 – Sept. 11 Review linear vs. non-linear representations. Text representation Compression techniques Image representation –grayscale –File size issues.
MANAGEMENT OF STEGANOGRAPHY OLALEKAN A. ALABI COSC 454.
Chapter Nine: Data Transmission. Introduction Binary data is transmitted by either by serial or parallel methods Data transmission over long distances.
Binary Representation in Text
Basic Computer Vocabulary
ENCODING AND SENDING FORMATTED TEXT
Binary 1 Basic conversions.
Topics Introduction Hardware and Software How Computers Store Data
Data Compression.
Welcome
Steganography Example
Creating a Word Document – Part 1
Adnan Abdul-Aziz Gutub* and Ahmed Ali Al-Nazer
National 5 Computing Science Specimen Question Paper
Data Compression CS 147 Minh Nguyen.
Command Syntax Chapter 2 Using the DIR Command with
Visit for more Learning Resources
Topics Introduction Hardware and Software How Computers Store Data
Creating a Word Document – Part 1
Fundamentals of Data Representation
Presenting information as bit patterns
Chapter Nine: Data Transmission
COMS 161 Introduction to Computing
Presentation transcript:

Mohammed Aabed Sameh Awaideh Abdul-Rahman Elshafei

Arabic Diacritics حركات Based Steganography Steganography is the ability of hiding information in redundant bits of any unremarkable cover media. This presentation will discuss new Arabic text steganography schemes. IntroductionBackgroundProposed ApproachResults & Analysis

Difficulties of Text Steganography In steganography, the cover media used to hide the message can be text, image, video or audio files. Using text media for this purpose is considered the hardest ! Text data does not have much needless information within the essential data. Fig. 1: Data Hiding in Binary Text Documents

Previous Techniques Many techniques have been proposed for text steganography that are mostly graphical in nature: 1. Line shifting: Text is divided into lines. Implementing 1 is by shifting the line a small fraction that can’t be detected by the bare eye. Implementing a 0 means keeping the line as is. 2. Word shifting: Same as previous but text is divided into words. 3. Word horizontal shifting: Same as previous approach but words are shifted left and right to indicate bits. Original Text: We are embedding a ‘b’ using horizontal word shifting. Modified Text: We are embedding a ‘b’ using horizontal word shifting. Hiding ‘b’ = 0x62 = b 4. White space manipulation: White spaces at the end of the line are not apparent. 5. HTML formatting: HTML syntax is case insensitive. This can be used to hide information.

Previous Techniques Other variations for the previous techniques are proposed. Pointed letters shifting. Kashida insertion. Some approaches consider the syntactic structure of the language used. ‘Run’ can be used instead of ‘sprint’ to mean something. In summary the previous techniques tackle one of two areas: Limitations of human sight. Specific language grammar.

Arabic Based Steganography Arabic language is the largest living member of the Semitic language family in terms of speakers. (270 million speakers). It contains 28 alphabet characters; 15 of which have points. اللٌّغَةُ Characters with no points Characters with one point Characters with two points Characters with three points أ ح د ر س ص ط ع ك ل م هـ و ب ج خ ذ ز ض ظ غف نب ج خ ذ ز ض ظ غف ن ت ق يت ق يث شث ش Fig. 2: Arabic Alphabet

Previous Approaches Vertical displacement of the points in the Arabic alphabet to hide information. Using letter points and extensions to hide data. Fig. 3: Using vertical displacement to hide data (M. Hassan Shirali-Shahreza, Mohammad Shirali-Shahreza ) Fig. 4: Using extensions to hide data (A. Gutub )

Diacritics (Harakat – حركات ) Arabic language uses eight symbols as diacritical marks. It is used to alter the pronunciation of a phoneme or to distinguish between words of similar spelling. The use of diacritics in the text is optional in written Standard Arabic. Diacritics َ Fatha ً Tanwin Fatha ُ Damma ٌ Tanwin Damma ِ Kasra ٍ Tanwin Kasra ْ Sukun ّ Shadda Fig. 5: Arabic Diacritics

Statistics for Diacritics First we needed to find the average occurrences of diacritics in a fully diacritized Arabic document. Then we needed to compare these occurrences to find the best embedding technique available. Both ambiguity and capacity are important factors to consider. Fig. 6: Sample for diactrized Arabic text  حَدَّثَنَا مُحَمَّدُ بْنُ جَعْفَرٍ قَالَ حَدَّثَنَا شُعْبَةُ عَنْ يَزِيدَ بْنِ خُمَيْرٍ عَنْ سُلَيْمِ بْنِ عَامِرٍ عَنْ أَوْسَطَ قَالَ خَطَبَنَا أَبُو بَكْرٍ رَضِيَ اللَّهُ عَنْهُ فَقَالَ قَامَ رَسُولُ اللَّهِ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ مَقَامِي هَذَا عَامَ الْأَوَّلِ وَبَكَى أَبُو بَكْرٍ فَقَالَ أَبُو بَكْرٍ سَلُوا اللَّهَ الْمُعَافَاةَ أَوْ قَالَ الْعَافِيَةَ فَلَمْ يُؤْتَ أَحَدٌ قَطُّ بَعْدَ الْيَقِينِ أَفْضَلَ مِنْ الْعَافِيَةِ أَوْ الْمُعَافَاةِ عَلَيْكُمْ بِالصِّدْقِ فَإِنَّهُ مَعَ الْبِرِّ وَهُمَا فِي الْجَنَّةِ وَإِيَّاكُمْ وَالْكَذِبَ فَإِنَّهُ مَعَ الْفُجُورِ وَهُمَا فِي النَّارِ وَلَا تَحَاسَدُوا وَلَا تَبَاغَضُوا وَلَا تَقَاطَعُوا وَلَا تَدَابَرُوا وَكُونُوا إِخْوَانًا كَمَا أَمَرَكُمْ اللَّهُ تَعَالَى  حَدَّثَنَا عَبْدُ الرَّحْمَنِ بْنُ مَهْدِيٍّ وَأَبُو عَامِرٍ قَالَا حَدَّثَنَا زُهَيْرٌ يَعْنِي ابْنَ مُحَمَّدٍ عَنْ عَبْدِ اللَّهِ يَعْنِي ابْنَ مُحَمَّدِ بْنِ عَقِيلٍ عَنْ مُعَاذِ بْنِ رِفَاعَةَ بْنِ رَافِعٍ الْأَنْصَارِيِّ عَنْ أَبِيهِ رِفَاعَةَ بْنِ رَافِعٍ قَالَ سَمِعْتُ أَبَا بَكْرٍ الصِّدِّيقَ رَضِيَ اللَّهُ عَنْهُ يَقُولُ عَلَى مِنْبَرِ رَسُولِ اللَّهِ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ سَمِعْتُ رَسُولَ اللَّهِ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ يَقُولُ فَبَكَى أَبُو بَكْرٍ حِينَ ذَكَرَ رَسُولَ اللَّهِ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ ثُمَّ سُرِّيَ عَنْهُ ثُمَّ قَالَ سَمِعْتُ رَسُولَ اللَّهِ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ يَقُولُ فِي هَذَا الْقَيْظِ عَامَ الْأَوَّلِ سَلُوا اللَّهَ الْعَفْوَ وَالْعَافِيَةَ وَالْيَقِينَ فِي الْآخِرَةِ وَالْأُولَى

Fig. 7: Statistics

Using Diacritics To Hide Data Analysis indicates that in standard Arabic the frequency of one diacritic, namely Fatha, is almost equal to the occurrence of the other seven diacritics. Assign a 1 to the diacritic Fatha and the remaining seven diacritics will represent a 0. Use a cover media that is empty of diacritics. Fig. 8: Diactrized and non-diactrized text

 To encode a value of 1 the algorithm looks for the first location where a Fatha can be placed and inserts the diacritic Fatha in the text.  Location determination is based on the rules defined by the Standard Arabic language grammar and syntax.  Or we can compare it to a copy of the cover media that is already diactrized (faster, and less complex) Syntactically Correct

Implementation Example Next, the algorithm looks for the next location where a Fatha can be placed if another 1 needs to be inserted and adds the Fatha. Otherwise, to insert a bit value of 0 the algorithm locates the first next position where any of the other diacritics can be inserted and adds that diacritic. This process is repeated for as long as there are bits remaining to be hidden. Fig. 9: Encoding the sequence using diacritics قـال الشيـخ الإمـام الـحـافظ أبـو عبـد الـلـه محمد بن إسماعيـل بن إبراهـيـم بن الـمـغيرة الـبخاري رحـمـه الـلـه تعـالـى آمـين Fig. 10: Encoding the same sequence using Kashida

Reusing The Cover Media The output file will have less diacritics than the original cover media (because of deletion). This means that reusing the same document more than once will mean less capacity. A research group at IBM has proposed techniques for restoration of Arabic diacritics based on maximum entropy. Fig. 11: Error rate in % for n-gram diacritic restoration

Results Compared to other techniques, capacity is the highest if a fully diactrized document is used as cover media. Ambiguity is dependent on the reader’s familiarity with Arabic language. Robustness is high since it can withstand: Printing Retyping Font changing OCR File Type File Size (Bytes) Cover Size (Bytes) Capacity (%).txt10,356318, %.wav43,4681,334, %.jpg23,796717, %.cpp10,356318, % Average3.27 % File Type File Size (Bytes) Cover Size (Bytes) Capacity (%).txt %.html %.cpp %.gif % Average1.22 % Table 1: Diacritics Technique Table 2: Kashida Technique

Analysis Advantages Approach is easily implemented using software. It produces high capacity. Can be modified for more ambiguity (Use one of the diacritics as dummy diacritic, or as a switching diacritic) Fairly robust. Can withstand OCR, retyping, printing and font changing. Disadvantages Medium to low ambiguity. Sending Arabic message with diacritics might raise suspicions nowadays. Arabic font has different encodings on different machines, can be computer dependant.