Recent Question/Assignment

IFN648 (2022 Sem 1) — Assignment 1 Historical Ciphers & Cryptanalysis
This assignment is worth 40 marks plus 5 extra-credit marks (which get added to your total to increase your final percentage for this assignment, even if it then exceeds 100%). The due date for submission will be no earlier than 14 full days following release; the exact deadline is as stated on the Blackboard submission page.
Challenges. You must have obtained the assignment paper since you are reading it, but you also need to download the accompanying ZIP file — challenges.zip — available on Blackboard alongside this document. The ZIP file contains 100 directories with names from 00 to 99. Yours is the one whose name is the last two digits of your QUT student ID number (e.g., if your ID is 01234567, use the challenge directory named 67). Your challenge set further contains two subdirectories with various files in them, described below. Be sure to use your own set, and include your student ID and full name at the top of your answer file as instructed.
Submission. There are two types of answers sought in this assignment: (1) written explanations in words and in pictures; and (2) specific solutions to cryptanalytic challenges.
1 Submit all your answers of the first type by uploading a single PDF file via a Blackboard submission link: under Assessment, find the item “Assignment 1 report”. This file is only for answering the open-ended questions of this assignment, and the only formatting requirement is that it be clear and legible.
2 Submit your answers of the second type, by uploading two ASCII (plain text) files via two more submission links named “Assignment 1 identification challenges” and “Assignment 1 matching challenges”. Those two files must be formatted exactly as described in §1 and §2 below. Create those two files using a plain text editor, as those files must be in plain text without any formatting, control, and/or non-ASCII characters. DO NOT use a word processing software for this; and if your computer uses a default language other than US/UK/AUS English, make sure that your text files are encoded in plain ASCII (the ASCII subset of UTF-8).
It is strongly recommended you, either: use a programmer’s editor for this assignment, i.e., something you’d use to edit the source code of a C or Java program or LaTeX document; or use a general-purpose plain-text editor such as nano (on Linux), TextEdit (on MacOS) or Notepad (on Windows). If you choose to use a multi-language editor or a word processor in spite of this warning, you do so at your own risk and must make

absolutely certain that you can prevent your editor from embedding invisible formatting characters or header metadata into your saved file (this is probably impossible in MS-Word). On Linux or MacOS, the command file filename will tell you the type of a file (the type should contain “ASCII text”), and the command hexdump -C filename will show you the file’s exact content byte-for-byte regardless of its type.
Thus, for your submission to be complete you must upload 3 files — one PDF and two ASCII — via 3 separate Blackboard links. The names of your answer files is not critical, but I recommend you name the machinereadable ASCII files using your QUT ID followed by the question number (Q1 and Q2) when uploading. Make sure your upload each file using the right dedicated link; if you make a mistake and upload two files using the same link, Blackboard will overwrite the former with the latter, so you’ll have to upload them again.
In the rest of this assignment, a box surrounding a block of text in monospace (“typewriter”) font text is meant to be an example of how you should format something in one of your text answer files.
0 Cipher information for both challenges
Both sets of challenges in this assignment were created using one out of a collection of eight (8) different cipher algorithms, which we designate using the numbers 1, 2, 3, 4, 5, 6, 7, 8. Nothing is said about the keys used, but the cipher algorithms, denoted 1–8, are the following:
1: the “null” cipher, a.k.a. the identity function;
2: a Cæsar cipher (on the 26-letter alphabet);
3: a simple random substitution cipher;
4: a simple transposition cipher;
5: a Vigen`ere cipher;
6: a “manual” one-time pad (see below);
7: an “electronic” Vernam cipher (see below);
8: AES-256 (in CBC mode with implicit ESSIV initialisation vector; don’t worry about the details).
The substitution ciphers numbered 2, 3, 5, 6 only work on the case-insensitive/case-preserving 26-letter Roman alphabet. All other characters (i.e., whitespaces, if present) are simply passed through from the plaintext into the ciphertext. Additionally, if the plaintext contains (upper/lower)-case information, that information is also passed through from the plaintext onto the corresponding letter of ciphertext (e.g., if a?s, then A?S).
The remaining ciphers numbered 1, 4, 7, 8 treat their input as strings of ASCII characters (or bytes, if you prefer), where each letter and each whitespace counts as one such ASCII byte. Those ciphers operate on strings of those bytes without pre- or post-processing outside of the cipher (i.e., whitespaces are not removed and letter case is not changed). Notice that the only distinction between cipher 6 from the previous paragraph and cipher 7 here is that the former operates on letters, while the latter operates on ASCII bytes (and there are many possible values of those bytes that do not encode letters).
All that you need to know about cipher 8, for this assignment, is that the AES cipher performs a rather complex transformation on whole blocks of 128 bits (16 ASCII characters) at a time. In CBC mode, this complex transformation somehow varies, so that two identical blocks in the plaintext will almost certainly give different corresponding blocks in the ciphertext. The “null” cipher number 1 needs no explanation.
1 Cipher identification in ciphertext-only attacks
In your individual data folder, you will find a sub-folder named Ident, and inside it, sixteen (16) ASCII text files named A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P. Each of those files is a ciphertext, obtained by encrypting some English plaintext (all different, and all with their own unspecified key), each using one of the eight cipher algorithms, numbered 1, 2, 3, 4, 5, 6, 7, 8, specified in §0.
All original plaintexts were in English: eight of the sixteen plaintexts contained a mix of upper/lower-case letters and whitespaces; the remaining eight containing only sinle-case (uppercase) letters with all spaces removed. (Note that knowing this should make your task significantly easier, but this fact may interfere with your analysis if you are blindly trying to apply automated tools without understanding what you or they are doing.)
Each of the eight ciphers was used to encrypt (using unique keys) two of the sixteen ciphertexts (8 × 2 = 16). You are also told that all original plaintexts had the same length: 997 bytes (letters & spaces) for each plaintext.
1.1 Cipher identification strategy [12 marks]
In this first part we will consider how to relate the generic features of the various ciphers to what we would expect to see in the ciphertexts, in order to identify the ciphers used. Note that this question is only about strategy; you will deal with the actual challenges in §1.2. This is also not about recovering messages or keys.
a. Briefly explain how frequency analysis (and what kind) could in principle be used to disambiguate between

the following ciphertexts (i.e., explain what you would be focusing on, in frequency analysis, if you already knew that the candidate ciphertext was one of these): (5 marks)
• a Cæsar ciphertext (cipher number 2 on the list);
• a simple random transposition ciphertext (cipher number 4);
• a simple random substitution ciphertext (cipher number 3);
• a Vigen`ere ciphertext (cipher number 5);
• a letters-only one-time pad (cipher number 6).
b. Now suppose you have narrowed a ciphertext down to either the Vigen`ere cipher (number 5) or the lettersonly one-time pad (number 6). Briefly explain what additional criterion (different from the above) you could use to help distinguish the two (again you can now assume you know for sure it is one of those two). Would your method already provide you with some clues about the key, if the indication was that the ciphertext is a Vigen`ere ciphertext? Explain. (2 marks)
c. Now illustrate your explanation above by working through a specific example taken from your identification

challenges It is up to you whether you wish to rely on screenshot(s) from CrypTool or similar, or make your point directly on a quoted portion of ciphertext; but you have to explain clearly what your are doing. (Note: you may wish to skip this task at first, and come back to it once you have actually identified at least one Vigen`ere ciphertext(s), in the next question.) (2 marks)
Hint: for this part, removing all spaces and converting all letters to lowercase will make your life easier, since as stated above ciphers 5 and 6 ignore the case and skip all non-letters when encrypting or decrypting.
d. AES is not a historical cipher, but a modern block cipher operating on blocks of 128 bits or 16 bytes. Figure out a way to distinguish, with almost absolute certainty, between the Vernam and AES-256-CBC ciphertexts on the one hand, and the six other ciphertext types on the other hand. (1 mark) Hint: you can attempt to view the content of a file using the command cat filename in a terminal. If the file contains gibberish or unprintable characters, hexdump -C filename will show you a side-by-side binary and text representation of the file content; this works for any file.
e. Suppose you have positively separated the Vernam and AES ciphertexts from the others. Would it be feasible, generally (i.e., every time), to distinsguish AES-CBC from Vernam encryption? If your answer is “yes”, explain your method and illustrate it with an example from your challenge set). If your answer is “no”, can you figure out a way to tell the AES-CBC from the Vernam ciphertexts in the specific challenge set you were given? (2 marks)
Hint: have a look at the exact size of your ciphertext files, in bytes (beware of rounding in “user-friendly” graphical file managers). In a Linux or MacOS terminal, the command ls -l will show this.
1.2 Cipher identification challenge [12 marks]
Your practical task here is to identify the cipher used to create each of the sixteen challenges in your identification challenge folder. You may proceed in any way you see fit, whether or not it follows the strategy you outlined above, as long as you work alone. You do not need to provide any explanation, just the answers. If external resources need to be acknowledged, do so in your report—not in your machine-readable ASCII text answer file.
Present your answer as an ASCII text file, whose first line reads: Q1 followed by a space and your 8-digit ID,

followed by another space and your full name. Next, make eight rows each starting with the single-digit number of one of the eight cipher types (numbered from 1 to 8, in order, according to the list of ciphers in §0). Append on each row the single-letter name of the ciphertext(s) you identify as being of that cipher type, each preceded by a single space. A ciphertext letter-name must not appear more than once per row. Each ciphertext should ideally appear exactly once in the whole table, but it is okay for a ciphertext to be listed more than one (on different rows) or not at all, if you have doubts. Upload your file on Blackboard as indicated earlier.

The following is an example of a suitably formatted answer file (though partially incorrect since the correct answer would have exactly two ciphertexts per cipher). You would save your own answer file givint it a name like 01234567-Q1.txt and upload it using the correct link.
Q1 01234567 Harry Potter
1 C
2 A B
3 E
4 D N P
5 G
6
7 F
8 F
The marking criteria for this task are as follows:
+0.75 mark for each ciphertext filename (e.g., ‘B’) placed on the row of its correct cipher (e.g., ‘2’).
-0.25 mark (a negative half-mark) for each ciphertext-name letter appearing in an incorrect row.
default to 0 mark for this practical task if your file is not a text file, if your table is formatted incorrectly, if it creates an ambiguity (e.g., an incorrect ID; or two rows with the same cipher index), or if there is attempt to abuse the marking rules (e.g., the same ciphertext named twice on the same row) — but you are allowed to omit some ciphertexts, or list the same ciphertext against different ciphers.
minimum of 0 mark: your score for this practical task will be raised to 0 if it falls below. maximum of 12 marks = 16 × 0.75 for a perfect answer.
Some hints and advice:
• You have to tell which cipher was actually used to make each ciphertext, not which one could have been. Don’t say ‘7’ (Vernam) for all, even though any ciphertext could theoretically be “explained” by some one-time-pad key!
• You may choose to list the same ciphertext against two or even three different ciphers if you cannot decide (e.g., ‘F’ on rows 7 and 8 in the example). If you list two and one is correct, you will earn a half-mark (= +0.75-0.25); but if both are incorrect you would lose a half mark (= -0.25-0.25). Use with caution. • If you have really no idea about a ciphertext, your safest bet is to omit it entirely from your answer (e.g., there is no mention of H in the above example).
• Glancing at the cipher list, you may think some of the ciphertexts are just infeasible to tell apart, but appearances may be deceiving. Work out or peruse the strategy questions, in §1.1, before jumping to hasty conclusions.
2 Indistinguishability cryptanalysis against classical ciphers
In your individual dataset, will be a second folder named Match, with twenty ASCII text files inside. Of those, the ten (10) named C0, C1, ..., C9 are ciphertext files; and the ten (10) named P0, P1, ..., P9, are plaintext files.
You are told that there is a one-to-one correspondence between the plaintexts and the ciphertexts, i.e., each ciphertext is the encryption of a unique plaintext (chosen at random without redraw) amongst the ones provided. The ciphers used are a subset of the ones listed in §0. Specifically, those four ciphers were used, as follows:
• two (2) ciphertexts are from the simple random substitution cipher (number 3);
• three (3) ciphertexts are from the simple transposition cipher (number 4);
• three (3) ciphertexts are from the Vigen`ere cipher (number 5);
• two (2) ciphertexts are from the case-insensitive letters-only one-time pad (number 6).
As you can see, all the plaintexts are in English, and some, but not all, have white spaces and/or a mix of lower/upper-case letters in them. See §0 for details of how case information and whitespaces are used or ignored by each of the four ciphers used in this challenge. As you can also verify, all challenge plaintexts, whether or not they contain whitespaces, have the same length.
2.1 Plaintext matching challenge [10 marks]
Your practical task is to perform a distinguishing attack on the provided ciphertexts, with the help of the provided plaintexts. A distinguishing attack seeks to break the “indistinguishability” security property of a cipher, by figuring out which one of two (or here, ten) given plaintexts, a challenge ciphertext corresponds to.
In its simplest, canonical definition, a distinguishing challenge consists of a single ciphertext along with two plaintexts of the same length, and the cryptanalyst’s task is to decide which one is correct between the two. The challenge you are facing here is similar, except that your task is to match the 10 given ciphertexts to the 10 given plaintexts. The correct answer is thus a one-to-one mapping from ciphertexts to plaintexts.
Present your answer as an ASCII text file starting with the characters Q2, a space, your 8-digit QUT ID number

on the first line, another space, and your full name, all on the first line. Then on each subsequent line state the two-character filename of one ciphertext followed by a dash then the two-character filename of a plaintext (no spaces) to indicate a matching. No explanations needed, but the work must be your own. If you must acknowledge outside sources, do so in the PDF report. Upload your file on Blackboard, per earlier instructions.

Here is a well-formatted (but not necessarily correct) example of answer file. Create your own answer file to look like this, give it a name such as 98765432-Q2.txt, and upload it using the correct link.
Q2 98765432 Hermione Granger
C0-P9
C1-P0
C2-P6
C5-P5
C6-P3
C7-P7
C8-P4
C9-P8
C1-P8
Again, the correct answer is a one-to-one mapping; but if you hesitate between two or more choices, it is okay to hedge your bets and not provide a perfect mapping. You may choose to leave out some plaintexts or ciphertexts (e.g., the example omits P1, P2, C3, C4); and you may also choose to match the same ciphertext with two different plaintexts and vice versa (e.g., the mappings C1-P0, C1-P8, C9-P8 overlap twice: at C1 and at P8). However, before doing this, be sure to understand the marking criteria. Do not try to give the same matching more than once.
The marking criteria for this cryptanalysis task are as follows:
+1 mark for each correct given matching provided it appears once only.
-0.5 mark for each incorrect given matching (but not bringing the overall score for this task below 0). 0 mark for the task if your answer file is missing, ill-formatted, not an ASCII text file, or has incorrect ID. minimum of 0 mark: your score for the task will be raised to 0 if negative. maximum of 10 marks: for a perfect answer.
2.2 Attacking the case-insensitive letter-only “one-time pad” [6 marks + 5 bonus]
Two (2) ciphertexts in your matching-challenge dataset have been created using case-insensitive one-time pads, applied to mixed-case plaintexts (cipher 6) — let’s call this cipher number 6, the “OTP-ish cipher”. As described in §0, in OTP-ish the upper/lower case information is merely preseved, unchanged, from each letter of plaintext to the corresponding letter of ciphertext, and so are the whitespaces.
a. Without (yet) going into technical details, what can you say about the OTP-ish cipher in terms of perfect secrecy? Briefly elaborate. (1 mark)
b. Could this OTP-ish cipher lead to a successful cryptanalysis (in the given distinguishing-attack scenario)? Say whether yes or no, and crisply explain how or why not. (1 mark)
c. Concretely demonstrate the (in)effectiveness of your candidate attack by walking through one example from your challenges. (2 marks)

d. Would a ciphertext-only plaintext-recovery attack (i.e., given only C0,...C9, recover P0,...,P9) be significantly harder than the distinguishing attack you did (i.e., given both C0,...C9, and P0,...,P9, find the matching)? As in this whole sequence of questions, argue your answer focusing on the OTP-ish cipher (number 6) exclusively. (2 marks)
e. Now crystallise and generalise your argument from the preceding subquestion, by appealing to information theory. Using information-theoretic notions and calculations, justify precisely why a distinguishing attack with 10 given plaintexts and 10 given ciphertexts (as you did in §2.1) would be, or would not be, significantly easier that a ciphertext-only message-recovery attack (where you are given some ciphertext and have to find the corresponding plaintext). In both cases, assume that all plaintexts are English and that all ciphertexts are made from those using the “OTP-ish” cipher only (ignore all other ciphers). Hints: You can assume that uppercase letters in the plaintext occur independently with probability , i.e., regardless of the other letters. The plaintexts are otherwise written in English, hence with a (caseinsensitive) information rate typical of English. Feel free to make a reasonable assumption regarding the length of the plaintext(s), if that helps you illustrate your point. (5 extra-credit marks)
End of paper. Good luck!
In the appendix on the next page are examples of correctly formatted, ASCII text-only, answer files.
A Formatting template for ASCII answer files
Question 1 upload example — filename: 87654321-Q1.txt
Q1 87654321 Gandalf The Gray
1 C
2 A B
3 E
4 D N P
5 G
6
7 F
8 F
Question 2 upload example — filename: 87654321-Q2.txt
Q2 87654321 Gandalf The Gray
C0-P9
C1-P0
C2-P6
C5-P5
C6-P3
C7-P7
C8-P4
C9-P8
C1-P8