I thought this might be of interest: it's code to parse a number written in Roman numerals and convert it to conventional Arabic numerals. It's written in a Microsoft proprietary language called C/AL, based on Pascal - I appreciate that most readers won't be familiar with it, but hopefully it's still readable as pseudo-code.
There are a few quirks:
- C/AL provides a textual data type called Code, which forces any text input into upper case. This neatly enabled me to avoid explicit case conversions or making the code handle both cases.
- C/AL also provides a data type called Option, which I've used for the character types (Unit or Five). It's used for small, fixed lists of values.
- The code provided assumes that the first element in arrays has index 0, as this makes things neater and is true of many common languages.
- I've assumed that the largest valid number to be handled is 3,999 (MMMCMXCXIX) as I'm not aware of a Roman numeral for 5,000 and I didn't want to handle a special case of MMMM for 4,000.
- I've also assumed that numbers such as 99 and 999 are only validly represented as XCIX and CMXCIX respectively, rather than the possibly valid IC and IM.
PROCEDURE ProcessRomanNum(RomanNum: Code[20]): ArabicNum: Integer;
VAR
RomNumLen: Integer;
i: Integer;
ArabDigit: ARRAY [4] OF Integer;
ArabNumPower: Integer;
CurrChar: Code[1];
CurrCharPower: Integer;
CurrCharType: 'Unit,Five';
BEGIN
ArabicNum := 0;
ArabNumPower := 4;
RomNumLen := STRLEN(RomanNum);
IF RomNumLen = 0 THEN
ERROR('No string');
IF RomNumLen > 15 THEN
ERROR('String too long');
FOR i := 0 TO 3 DO
ArabDigit[i] := 0;
FOR i := 1 TO RomNumLen DO BEGIN
CurrChar := COPYSTR(RomanNum, i, 1);
ConvertRomanDigit(CurrChar, CurrCharPower, CurrCharType);
IF CurrCharPower > ArabNumPower THEN BEGIN
IF (CurrCharPower > (ArabNumPower + 1)) OR (ArabDigit[ArabNumPower] <> 1) OR (CurrCharType = CurrCharType::Five) THEN
ERROR('Bad sequence: %1', COPYSTR(RomanNum, i-1, 2));
// only valid combinations: IX, XC, CM
ArabDigit[ArabNumPower] := 9;
END ELSE BEGIN
IF CurrCharPower < ArabNumPower THEN
ArabNumPower := CurrCharPower;
IF CurrCharType = CurrCharType::Five THEN BEGIN
CASE ArabDigit[ArabNumPower] OF
0:
ArabDigit[ArabNumPower] := 5;
1: // combinations: IV, XL, CD
ArabDigit[ArabNumPower] := 4;
ELSE
ERROR('Bad sequence: %1', COPYSTR(RomanNum, i-1, 2));
END;
END ELSE BEGIN // CurrCharType = Unit
CASE ArabDigit[ArabNumPower] OF
3, 8:
ERROR('Too many consecutive units: %1', CurrChar);
4, 9:
ERROR('Bad sequence: %1', COPYSTR(RomanNum, i-2, 3));
ELSE
ArabDigit[ArabNumPower] += 1;
END;
END;
END;
END;
FOR i := 0 TO 3 DO
ArabicNum += ArabDigit[i] * POWER(10, i);
END;
PROCEDURE ConvertRomanDigit(TestChar: Code[1];VAR Power : Integer;VAR Type: 'Unit,Five');
BEGIN
CASE TestChar OF
'C', 'I', 'M', 'X':
Type := Type::Unit;
'D', 'L', 'V':
Type := Type::Five;
ELSE
ERROR('Bad character: %1', TestChar);
END;
CASE TestChar OF
'I', 'V':
Power := 0;
'L', 'X':
Power := 1;
'C', 'D':
Power := 2;
'M':
Power := 3;
END;
END;
Readers are welcome to pick this code up and use it, translating to other languages as they wish. Credit and/or a post here to say so would be appreciated but aren't required. I'm reasonably confident that the code correctly handles all valid inputs, and rejects all invalid ones, but I can't guarantee this, and people using it should carry out their own testing.
EDIT 17/07/21: removed the spoiler tags, corrected the Roman numeral for 3999 (!)