Cracking The PDF Document Password With The Help Of ChatGPT

In this blog post, we’ll explore how ChatGPT helped me in cracking password of a PDF document.

Background Story

I have invested some amount in mutual fund ( i.e. Managed Funds or ETF) on one of my brother’s son name around 15 years back. I almost forgot about the the investment. Few days I got hold of the of the statement. So I called customer care of the fund house and asked them to send the statement of my investment. I received the statement in PDF format and it was protected with a password. The password is my brothers son PAN ( equivalent Tax File Number or IRS/IRD Number) which uniquely identifies each person for the purpose of tax filing. I also lost the PAN number of the my brother’s son.

So I was looking for the ways to crack the password of PDF document. I tried with some online websites but as document has complex password they could not crack it. Then I came across pdfrip software which is a PDF password cracking utility written Rust.

pdfrip utility allows dictionary attacks, date, number range, and alphanumeric brute-forcing, and a custom query builder for password formats.

my brother’s son (Harshit Mohan) suggested that we can use dictionary attack by generating all possible PAN numbers.

The PAN number ( password of PDF) generation follows certain format. So my brother’s son (Harshit Mohan) suggested that we can use dictionary attack by generating all possible PAN numbers.

PAN Format

Permanent Account Number abbreviated as PAN is a unique 10-digit alphanumeric number issued by the Income Tax Department to Indian taxpayers.

PAN has a typical format of ABCTY1234D.

  • The first three characters, i.e. ‘ABC’ in the above number is an alphabetic series ranging between AAA and ZZZ.
  • The 4th character, i.e. ‘T’, represents the PAN holder’s status. The alphabet ‘T’ represents Trust, ‘F’ for Firm, ‘H’ for HUF, ‘P’ for Individual, ‘C’ for Company etc.
  • The 5th character, i.e. ‘Y’, represents the first alphabet the PAN holder’s last name.
  • The next four characters are sequential digits ranging between 0001 and 9999.
  • The 10th character, i.e. ‘D’, is an alphabetic check digit that runs from A to Z.

There are approximately 11.5 billion possible PAN numbers for individuals based on the given rules. We can further reduce the probable PAN numbers as we already know the last name of candidate. So the possible PAN numbers comes down to 460 million. which still quite a lot for dictionary attack.

So When I observed the PAN of all the family members, I found out that even though first 3 letters of PAN could be alphabetic series ranging between AAA and ZZZ, most of the are starting with A and two of them starting B. So it seems Income Tax department issued PAN starting with letter ‘A’ and when they exhausted all possible combinations started issued PANs starting letter ‘B’. Now we were sure that PAN we are looking for starting with either A or B. So we could further reduce the probable candidates.

Now we wanted a write the program to generate all possible PANs starting with letter ‘A’.

My brothers son suggested that we could use chatGPT to quickly generate the program and use it instead of writing on our own.

So we gave following prompt to chatGPT to generate a python program.

Write a python program to generate all possible PAN numbers for individuals starting with A letter write them to a file based on following rules. PAN has a typical format of ABCTY1234D. The first three characters, i.e. ‘ABC’ in the above number is an alphabetic series ranging between AAA and ZZZ. The 4th character, i.e. ‘T’, represents the PAN holder’s status. The alphabet ‘T’ represents Trust, ‘F’ for Firm, ‘H’ for HUF, ‘P’ for Individual, ‘C’ for Company etc. The 5th character, i.e. ‘Y’, represents the first alphabet the PAN holder’s last name. The next four characters are sequential digits ranging between 0001 and 9999. The 10th character, i.e. ‘D’, is an alphabetic check digit that runs from A to Z.

chatGPT gave following code

def generate_pan_numbers():
    pan_numbers = []
    for first_char in 'A':
        for second_char in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
            for third_char in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
                for status in 'P':  # Only 'P' for individuals
                    for last_name_first_char in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
                        for seq_digits in range(1, 10000):
                            pan = f"{first_char}{second_char}{third_char}{status}{last_name_first_char}{seq_digits:04d}"
                            pan_numbers.append(pan)
    return pan_numbers

def write_to_file(pan_numbers, filename):
    with open(filename, 'w') as file:
        for pan in pan_numbers:
            file.write(pan + '\n')

if __name__ == "__main__":
    pan_numbers = generate_pan_numbers()
    write_to_file(pan_numbers, 'pan_numbers.txt')
    print("Generated PAN numbers and saved them to pan_numbers.txt file.")
Code language: Python (python)

We ran above program to generate all possible PANs but it was taking lot of time as we are trying to generate all possible combinations starting with ‘A’.

As I already know about first letter of surname, I could reduce the no. of pan numbers I should test against. So changed prompt like below

Write a python program to generate all possible PAN numbers for individuals starting with A letter , last name starting with ‘G’ and write them to a file based on PAN rules.

I got following program from chatGPT

import itertools
import string
import time


def generate_pan_numbers():
    first_three_chars = 'A'
    last_name_first_char = 'G'
    status_char = 'P'
    check_digits = string.ascii_uppercase  # 'A' to 'Z'

    pan_numbers = []
    for second_char, third_char in itertools.product(string.ascii_uppercase, repeat=2):
        for seq_digits in range(1, 10000):
            for check_digit in check_digits:
                pan = f"{first_three_chars}{second_char}{third_char}{status_char}{last_name_first_char}{seq_digits:04d}{check_digit}"
                pan_numbers.append(pan)
    return pan_numbers

def write_to_file(pan_numbers, filename):
    with open(filename, 'w') as file:
        for pan in pan_numbers:
            file.write(pan + '\n')

def main():
    start = time.time()   
    pan_numbers = generate_pan_numbers()
    write_to_file(pan_numbers, "pan_numbers.txt")
    print("Generated PAN numbers and saved them to pan_numbers.txt file.")
    end = time.time()
   
    print(end - start)

if __name__ == "__main__":
    main()
Code language: Java (java)

When I ran above program on my laptop( Intel core-i7 8550U processor), it took around 200 seconds to generate all possible individual PANs starting with letter ‘A’ and last name first char is ‘G’. Generated file has approximate size of 2 GB.

We thought that may be using RUST could speedup the PAN generation process even further

So we prompted the chatGPT to generate the same program in RUST so that we could take advantage of multi threading.

ChatGPT gave the following code. But it is not multi threading program.

use std::fs::File;
use std::io::{Write, Result};

fn generate_pan_numbers() -> Vec<String> {
    let mut pan_numbers = Vec::new();

    for second_char in b'A'..=b'Z' {
        for third_char in b'A'..=b'Z' {            
                for seq_digits in 1..=9999 {
                    let check_digit = ((seq_digits % 26) as u8 + b'A') as char;
                    let pan = format!(
                        "A{}{}PG{:04}{}",
                        second_char as char,
                        third_char as char,                       
                        seq_digits,
                        check_digit
                    );
                    pan_numbers.push(pan);
                }           
        }
    }

    pan_numbers
}

fn write_to_file(pan_numbers: &[String], filename: &str) -> Result<()> {
    let mut file = File::create(filename)?;

    for pan in pan_numbers {
        file.write_all(pan.as_bytes())?;
        file.write_all(b"\n")?;
    }

    Ok(())
}

fn main() {
    let pan_numbers = generate_pan_numbers();
    match write_to_file(&pan_numbers, "pan_numbers.txt") {
        Ok(_) => println!("Generated PAN numbers and saved them to pan_numbers.txt file."),
        Err(e) => eprintln!("Error writing to file: {}", e),
    }
}
Code language: Rust (rust)

We had to prompt chatGPT to generate multi threaded RUST program to generate all possible PAN numbers.

ChatGPT gave the following code.



use std::fs::File;
use std::io::{Write, Result};
use std::sync::{Arc, Mutex};
use std::thread;

fn generate_pan_numbers() -> Vec<String> {
    let pan_numbers = Arc::new(Mutex::new(Vec::new()));

    let handles: Vec<_> = (0..26).map(|i| {
        let pan_numbers = Arc::clone(&pan_numbers);
        thread::spawn(move || {
            let mut nums = pan_numbers.lock().unwrap();
            for second_char in b'A' + i..=b'Z' {
                for third_char in b'A'..=b'Z' {
                        for seq_digits in 1..=9999 {
                            let pan = format!(
                                "A{}{}PG{:04}{}",
                                second_char as char,
                                third_char as char,
                                // last_name_first_char as char,
                                seq_digits,
                                // Implementing the alphabetic check digit
                                ('A' as u8 + (seq_digits % 26) as u8) as char
                            );
                            nums.push(pan);
                        }
                }
            }
        })
    }).collect();

    for handle in handles {
        handle.join().unwrap();
    }

    Arc::try_unwrap(pan_numbers).unwrap().into_inner().unwrap()
}

fn write_to_file(pan_numbers: &[String], filename: &str) -> Result<()> {
    let mut file = File::create(filename)?;

    for pan in pan_numbers {
        file.write_all(pan.as_bytes())?;
        file.write_all(b"\n")?;
    }

    Ok(())
}

fn main() {
    let pan_numbers = generate_pan_numbers();
    if let Err(err) = write_to_file(&pan_numbers, "pan_numbers.txt") {
        eprintln!("Error writing to file: {}", err);
    } else {
        println!("Generated PAN numbers and saved them to pan_numbers.txt file.");
    }
}
Code language: Java (java)

The above program was not generating all possible PAN numbers.

So tried to prompt with different variations but most of programs generated were either not generating all possible PANs or erroring out due to memory or overflow error.

So asked chatGPT to write memory efficient program with following prompt.

Write a multi thread RUST program to generate all possible PAN numbers for individuals starting with A letter and last name starting with ‘G’ write them to a file based on following rules . make sure that its memory efficient .

use rayon::prelude::*;
use std::fs::File;
use std::io::{BufWriter, Write};
use std::sync::Mutex;

fn generate_and_write_pan_numbers(filename: &str) {
    let file = File::create(filename).expect("Unable to create file");
    let writer = BufWriter::new(file);
    let writer = Mutex::new(writer);

    (b'A'..=b'Z')
        .into_par_iter()
        .flat_map(|second_char| {
            (b'A'..=b'Z')
                .into_par_iter()
                .flat_map(move |third_char| {
                    (b'A'..=b'Z')
                        .into_par_iter()
                        .flat_map(move |status| {
                            (1..=9999).into_par_iter().map(move |seq_digits| {
                                let check_digit = ((seq_digits % 26) as u8 + b'A') as char;
                                format!(
                                    "A{}{}{}P{}{:04}{}",
                                    second_char as char,
                                    third_char as char,
                                    status as char,
                                    'G', // Last name starting with 'G'
                                    seq_digits,
                                    check_digit,
                                )
                            })
                        })
                })
        })
        .for_each(|pan| {
            let mut writer = writer.lock().unwrap();
            writeln!(writer, "{}", pan).expect("Unable to write data");
        });
}

fn main() {
    use std::time::Instant;
    let now = Instant::now();
    let filename = "pan_numbers.txt";
    generate_and_write_pan_numbers(filename);
    println!("Generated PAN numbers and saved them to {} file.", filename);
    let elapsed = now.elapsed();
    println!("Total time taken: {:.2?}", elapsed);
}
Code language: Java (java)

The program generated all possible PANs in 148 seconds on my laptop.

You could also write multi threaded program in Rust which generates all possible PANs in following way

use std::fs::OpenOptions;
use std::char;
use std::io::Write;
use itertools::Itertools;
use rayon::iter::ParallelBridge;
use rayon::prelude::ParallelIterator;

fn main() {
    // Define the possible values for each position
    let first_letter = 'A';
    let status_code = 'P';
    let last_name_alphabet = 'G';

    let alphabet_series: Vec<String> = (b'A'..=b'Z')
        .cartesian_product(b'A'..=b'Z')
        .map(|(x, y)| format!("{}{}", x as char, y as char))
        .par_bridge()
        .collect();

    let digits: Vec<String> = (1..=9999)
        .map(|i| format!("{:04}", i))
        .collect();
    let check_digits: Vec<char> = (b'A'..=b'Z').map(|c| c as char).collect();

    // Generate all combinations
    let combinations: Vec<String> = alphabet_series.iter()
        .cartesian_product(&digits)
        .cartesian_product(&check_digits)
        .map(|((&ref alphabet, &ref digit), &check_digit)| format!("{}{}{}{}{}{}", first_letter, alphabet, status_code, last_name_alphabet, digit, check_digit))
        .par_bridge()
        .collect();

    let mut file = OpenOptions::new().create(true).append(true).open("pan_numbers.txt").expect("Unable to open file");

    // Print combinations
    file.write_all(&combinations.join("\n").as_bytes()).expect("write failed");

    println!("PAN numbers have been generated and written to 'pan_numbers.txt' file.");
}
Code language: Java (java)

The program generated all possible PANs in 115 seconds on my laptop.

Once I have file with all possible PANs , I used dictionary attack using pdfrip and was able to crack the password of document.

pdfrip  -f <encrypted-filename.pdf>  wordlist <pan_numbers.txt>Code language: Java (java)

References

https://cleartax.in/glossary/permanent-account-number-pan/

https://github.com/mufeedvh/pdfrip

Similar Posts