|
Descripción
|
This dataset contains 800 Python code pairs illustrating migrations from pre-quantum to post-quantum cryptographic primitives. Each instance represents an independent migration case, including both the classical implementation and its quantum-resistant counterpart, validated through static and dynamic testing. The corpus is structured in seven categories covering the main families of applied cryptography: - Symmetric encryption (AES-128/3DES → AES-256) - Digital signatures (RSA, ECDSA, Ed25519 → Dilithium / SPHINCS+) - Key exchange (DH, ECDH, X25519 → Kyber1024) - Hash functions (MD5, SHA-1/2, BLAKE2 → SHA-3) - Message Authentication Codes (CMAC/HMAC → HMAC-SHA3-512) - Authenticated encryption (AES-GCM/CCM → ChaCha20-Poly1305) - Hybrid schemes (combinations of the above) Each record in the JSON Lines file includes the following fields: base_case_id, variation_id, migration_category, prequantum_algorithm, postquantum_algorithm, prequantum_code, postquantum_code, and description. The dataset was built iteratively using human-in-the-loop generation with a large language model (OpenAI o4-mini-high), producing batches of ten variations followed by manual curation. Every snippet was validated with a unified testing framework ensuring syntactic correctness and functional equivalence between pre- and post-quantum implementations. All examples were generated in Python and designed to be self-contained and reproducible, supporting research on cryptographic agility and LLM-assisted migration toward NIST-standardized PQC algorithms (Kyber, Dilithium, SPHINCS+).
|