🏊 💆🏿 🚯 Quantas primitivas são necessárias para implementar um sistema de forte? 👩🏽‍🏫 🐘 📔

Em 1992, outra competição para programação ofuscada na linguagem C foi realizada . Um dos projetos apresentados foi um sistema de pequeno forte . Ocorreu-me que a máquina virtual foi implementada em apenas 794 bytes de código C. O resto do sistema do forte foi carregado da fonte no forte. Depois de estudar o projeto, a empolgação inicial deu lugar à decepção, já que o autor usou um truque não totalmente "honesto": ele usou a função scanf () para analisar o código-fonte fortificado. Daquele momento em diante, fui atormentado pela pergunta - quantos primitivos são necessários para implementar um sistema de forte sem tais truques?

Depois de um tempo, fui apresentado ao sistema de fortaleza sod32 , escrito por Lennart Benschop . Este sistema de fort possui uma máquina virtual escrita em C, e a imagem binária que é carregada na máquina virtual não depende da ordem de bytes no sistema de host de palavras. sod32 tem 32 primitivos, aqui estão eles:


            
              
              
                
                NOOP 
                ( --- ) 
                Do nothing 
                
                
                SWAP 
                ( x1 x2 --- x2 x1 ) 
                Swap the two top items on the stack. 
                
                
                ROT 
                ( x1 x2 x3 --- x2 x3 x1 ) 
                Rotate the three top items on the stack. 
                
                
                0= 
                ( x --- f) 
                f is true if and only if x is 0. 
                
                
                NEGATE 
                ( n1 --- -n1) 
                Negate top number on the stack. 
                
                
                UM* 
                ( u1 u2 --- ud ) 
                Multiply two unsigned numbers, giving double result. 
                
                
                C@ 
                ( c-addr --- c) 
                Fetch character c at c-addr. 
                
                
                @ 
                ( a-addr --- x) 
                Fetch cell x at a-addr. 
                
                
                + 
                ( w1 w2 --- w3)  
                Add the top two numbers on the stack. 
                
                
                AND 
                ( x1 x2 --- x3) 
                Bitwise and of the top two cells on the stack. 
                
                
                OR 
                ( x1 x2 --- x3) 
                Bitwise or of the top two cells on the stack. 
                
                
                XOR 
                ( x1 x2 --- x3) 
                Bitwise exclusive or of the top two cells on the stack. 
                
                
                U< 
                ( u1 u2 ---- f) 
                f is true if and only if unsigned number u1 is less than u2. 
                
                
                < 
                ( n1 n2 --- f) 
                f is true if and only if signed number n1 is less than n2. 
                
                
                LSHIFT 
                ( x1 u --- x2) 
                Shift x1 left by u bits, zeros are added to the right. 
                
                
                RSHIFT 
                ( x1 u --- x2) 
                Shift x1 right by u bits, zeros are added to the left. 
                
                
                UM/MOD 
                ( ud u1 --- urem uquot) 
                Divide the unsigned double number ud by u1, giving unsigned quotient and remainder. 
                
                
                +CY 
                ( n1 n2 cy1 --- n3 cy2) 
                Add n1 and n2 and the carry flag cy1 (must be 0 or 1) giving sum n3 and carry flag cy2. (n3 cy2 can be seen as a double number) 
                
                
                SCAN1 
                ( x d --- u) 
                Scan x for the first 1 bit. u is the position of that bit (counted from the scan direction) and 32 if x=0. d is the scan direction, 0 is left-to-right, 1 is right-to-left. 
                
                
                SPECIAL 
                ( x ---) 
                Any of a large number of special instructions, indicated by x. 
                
                
                DROP 
                ( x ---) 
                Discard the top item on the stack. 
                
                
                >R 
                ( x ---) 
                Push x on the return stack. 
                
                
                C!A 
                ( c c-addr --- c-addr) 
                Store character c at c-addr, keep address. 
                
                
                !A 
                ( x a-addr --- a-addr) 
                Store cell x at a-addr, keep address. 
                
                
                DUP 
                ( x --- x x ) 
                Duplicate the top cell on the stack. 
                
                
                OVER 
                ( x1 x2 --- x1 x2 x1) 
                Copy the second cell of the stack. 
                
                
                R@ 
                ( --- x) 
                x is a copy of the top of the return stack. 
                
                
                R> 
                ( --- x) 
                Pop the top of the return stack and place it on the stack. 
                
                
                0 
                ( --- 0) 
                The constant number 0. 
                
                
                1 
                ( --- 1) 
                The constant number 1. 
                
                
                4 
                ( --- 4) 
                The constant number 4. 
                
                
                LIT 
                ( --- lit) 
                Push literal on the stack (literal number is in-line).

Percebi que tinha a chance de encontrar uma resposta para minha pergunta e comecei a transformar primitivas em definições de alto nível. Quero salientar desde já que toda essa atividade tem um sentido puramente acadêmico. É improvável que seja possível aplicar os resultados obtidos na prática devido à perda de desempenho. Enquanto fazia experiências, acompanhei o tamanho do binário do sistema de fort e o tempo de execução do conjunto de testes de John Hayes. Eu construí uma nova imagem binária com o comando

echo 'S" extend-cross.4" INCLUDED '|./sod32 kernel.img

e executei os testes assim:

time ./sod32 kernel.img <<< $(echo -e 'S" tester.fr" INCLUDED\n12345\nBYE')

Na tabela abaixo, você pode ver como cada mudança afetou o tamanho e o desempenho. Links da coluna "alterar" levam ao commit correspondente no github.

o troco	tamanho do kernel.img	tempo de execução tester.fr
sod original 32	10164	0m0.059s
lshift, rshift	10312	0m0.071s
+, um *, um / mod	11552	0m0.123s
c @, c! a	11952	0m0.795s
0 =, negar <, u <	12340	0m2.800s
solta	12784	0m5.022s
trocar, apodrecer, sobre	13436	0m5.860s
sp @, sp!, rp @, rp!, dup	13680	0m8.696s
r @, r>,> r	14160	0m15.463s
e, ou, xor	14336	0m21.198s
0, 1, 4	15236	0m21.671s
0 ramo	15912	0m41.765s

Como resultado, o tamanho da imagem binária do sistema do forte aumentou de 10164 para 15912 (+ 57%), o desempenho caiu 708 vezes (quase 3 ordens de magnitude). Talvez o desempenho possa ser melhorado criando o perfil do código e otimizando os gargalos, mas tenho certeza de que o resultado ainda será muito lento em comparação com o sod32 original.

Com 32 primitivas mais lógica adicional no interpretador interno, acabei com 7: nop, nand,!, @, Um +, especial, lit. O interpretador interno retém a lógica para executar primitivas e definições de alto nível (chamada), bem como a lógica para completar a definição de alto nível (próxima). Eu encontrei a resposta para minha pergunta: um sistema de forte pode ser construído com base em 9 primitivos (ou 8, se nop não precisa ser um primitivo). Estou confuso com o fato de que existem até 3 primitivas para acesso à memória: @,! e aceso, mas não descobri como evitá-lo. Eu poderia muito bem ter perdido algo, então se você sabe como se livrar de um grande número de primitivos, por favor, escreva nos comentários.

NOOP	( --- )	Do nothing
SWAP	( x1 x2 --- x2 x1 )	Swap the two top items on the stack.
ROT	( x1 x2 x3 --- x2 x3 x1 )	Rotate the three top items on the stack.
0=	( x --- f)	f is true if and only if x is 0.
NEGATE	( n1 --- -n1)	Negate top number on the stack.
UM*	( u1 u2 --- ud )	Multiply two unsigned numbers, giving double result.
C@	( c-addr --- c)	Fetch character c at c-addr.
@	( a-addr --- x)	Fetch cell x at a-addr.
+	( w1 w2 --- w3)	Add the top two numbers on the stack.
AND	( x1 x2 --- x3)	Bitwise and of the top two cells on the stack.
OR	( x1 x2 --- x3)	Bitwise or of the top two cells on the stack.
XOR	( x1 x2 --- x3)	Bitwise exclusive or of the top two cells on the stack.
U<	( u1 u2 ---- f)	f is true if and only if unsigned number u1 is less than u2.
<	( n1 n2 --- f)	f is true if and only if signed number n1 is less than n2.
LSHIFT	( x1 u --- x2)	Shift x1 left by u bits, zeros are added to the right.
RSHIFT	( x1 u --- x2)	Shift x1 right by u bits, zeros are added to the left.
UM/MOD	( ud u1 --- urem uquot)	Divide the unsigned double number ud by u1, giving unsigned quotient and remainder.
+CY	( n1 n2 cy1 --- n3 cy2)	Add n1 and n2 and the carry flag cy1 (must be 0 or 1) giving sum n3 and carry flag cy2. (n3 cy2 can be seen as a double number)
SCAN1	( x d --- u)	Scan x for the first 1 bit. u is the position of that bit (counted from the scan direction) and 32 if x=0. d is the scan direction, 0 is left-to-right, 1 is right-to-left.
SPECIAL	( x ---)	Any of a large number of special instructions, indicated by x.
DROP	( x ---)	Discard the top item on the stack.
>R	( x ---)	Push x on the return stack.
C!A	( c c-addr --- c-addr)	Store character c at c-addr, keep address.
!A	( x a-addr --- a-addr)	Store cell x at a-addr, keep address.
DUP	( x --- x x )	Duplicate the top cell on the stack.
OVER	( x1 x2 --- x1 x2 x1)	Copy the second cell of the stack.
R@	( --- x)	x is a copy of the top of the return stack.
R>	( --- x)	Pop the top of the return stack and place it on the stack.
0	( --- 0)	The constant number 0.
1	( --- 1)	The constant number 1.
4	( --- 4)	The constant number 4.
LIT	( --- lit)	Push literal on the stack (literal number is in-line).

Quantas primitivas são necessárias para implementar um sistema de forte?

More articles: