Roman numeral converter

Started by Glaurung, August 07, 2016, 01:00:49 AM

Previous topic - Next topic

Glaurung

I thought this might be of interest: it's code to parse a number written in Roman numerals and convert it to conventional Arabic numerals. It's written in a Microsoft proprietary language called C/AL, based on Pascal - I appreciate that most readers won't be familiar with it, but hopefully it's still readable as pseudo-code.

There are a few quirks:
- C/AL provides a textual data type called Code, which forces any text input into upper case. This neatly enabled me to avoid explicit case conversions or making the code handle both cases.
- C/AL also provides a data type called Option, which I've used for the character types (Unit or Five). It's used for small, fixed lists of values.
- The code provided assumes that the first element in arrays has index 0, as this makes things neater and is true of many common languages.
- I've assumed that the largest valid number to be handled is 3,999 (MMMCMXCXIX) as I'm not aware of a Roman numeral for 5,000 and I didn't want to handle a special case of MMMM for 4,000.
- I've also assumed that numbers such as 99 and 999 are only validly represented as XCIX and CMXCIX respectively, rather than the possibly valid IC and IM.


PROCEDURE ProcessRomanNum(RomanNum: Code[20]): ArabicNum: Integer;
VAR
  RomNumLen: Integer;
  i: Integer;
  ArabDigit: ARRAY [4] OF Integer;
  ArabNumPower: Integer;
  CurrChar: Code[1];
  CurrCharPower: Integer;
  CurrCharType: 'Unit,Five';
BEGIN
  ArabicNum := 0;
  ArabNumPower := 4;

  RomNumLen := STRLEN(RomanNum);
  IF RomNumLen = 0 THEN
    ERROR('No string');
  IF RomNumLen > 15 THEN
    ERROR('String too long');

  FOR i := 0 TO 3 DO
    ArabDigit[i] := 0;

  FOR i := 1 TO RomNumLen DO BEGIN
    CurrChar := COPYSTR(RomanNum, i, 1);
    ConvertRomanDigit(CurrChar, CurrCharPower, CurrCharType);

    IF CurrCharPower > ArabNumPower THEN BEGIN
      IF (CurrCharPower > (ArabNumPower + 1)) OR (ArabDigit[ArabNumPower] <> 1) OR (CurrCharType = CurrCharType::Five) THEN
        ERROR('Bad sequence: %1', COPYSTR(RomanNum, i-1, 2));

      // only valid combinations: IX, XC, CM
      ArabDigit[ArabNumPower] := 9;
    END ELSE BEGIN
      IF CurrCharPower < ArabNumPower THEN
        ArabNumPower := CurrCharPower;

      IF CurrCharType = CurrCharType::Five THEN BEGIN
        CASE ArabDigit[ArabNumPower] OF
          0:
            ArabDigit[ArabNumPower] := 5;
          1: // combinations: IV, XL, CD
            ArabDigit[ArabNumPower] := 4;
          ELSE
            ERROR('Bad sequence: %1', COPYSTR(RomanNum, i-1, 2));
        END;
      END ELSE BEGIN  // CurrCharType = Unit
        CASE ArabDigit[ArabNumPower] OF
          3, 8:
            ERROR('Too many consecutive units: %1', CurrChar);
          4, 9:
            ERROR('Bad sequence: %1', COPYSTR(RomanNum, i-2, 3));
          ELSE
            ArabDigit[ArabNumPower] += 1;
        END;
      END;
    END;
  END;

  FOR i := 0 TO 3 DO
   ArabicNum += ArabDigit[i] * POWER(10, i);
END;

PROCEDURE ConvertRomanDigit(TestChar: Code[1];VAR Power : Integer;VAR Type: 'Unit,Five');
BEGIN
  CASE TestChar OF
    'C', 'I', 'M', 'X':
      Type := Type::Unit;
    'D', 'L', 'V':
      Type := Type::Five;
    ELSE
      ERROR('Bad character: %1', TestChar);
  END;

  CASE TestChar OF
    'I', 'V':
      Power := 0;
    'L', 'X':
      Power := 1;
    'C', 'D':
      Power := 2;
    'M':
      Power := 3;
  END;
END;


Readers are welcome to pick this code up and use it, translating to other languages as they wish. Credit and/or a post here to say so would be appreciated but aren't required. I'm reasonably confident that the code correctly handles all valid inputs, and rejects all invalid ones, but I can't guarantee this, and people using it should carry out their own testing.

EDIT 17/07/21: removed the spoiler tags, corrected the Roman numeral for 3999 (!)

Glaurung

Update time, almost five years later (gulp!). I've started learning Python, so translating my C/AL code to Python seemed like a good exercise. Here's what I've come up with - the logic should be pretty much the same, but I've tried to take advantage of Python features and move away from C/AL-specific constructions. If anyone with more Python experience would like to cast an eye over it and offer suggestions for how to make it more elegant, I'd be grateful. In particular, some of the extended 'if' statements seem rather clunky in a Python context, but I can't yet see how to make them cleaner.

A few points to save people asking:
- My Python environment is the latest version, 3.9.6 - though the book I'm working from is the O'Reilly Learning Python, so it's only up to date to version 3.3. I'm not currently expecting to need Python 2.X so I'm not making any particular effort to learn 2.X differences.
- The ConvertNum function is designed to accept a string argument (but should cope with any variable type), and return a tuple consisting of a boolean value (is the input a valid Roman numeral?), and either the corresponding Arabic number or a string saying why the input is invalid.
- I'm using the 'for i in range' construction as the meat of the ConvertNum function because I need to be able to access earlier parts of the input string for error messages. If there's a more elegant way to write 'for letter in string' while still being able to take a larger slice of the string, I'd like to know.


def ConvertNum(RomNum):
   RomNum = str(RomNum).upper()
   RomNumLen = len(RomNum)
   if RomNumLen < 1:
      return(False, 'Bad input - empty string')
   if RomNumLen > 15:
      return(False, 'Bad input - string too long: ' + RomNum)

   ArDigit = [0,0,0,0]

   ArNumPower = 4
   for i in range(RomNumLen):
      CurrRomDigit = RomNum[i]
      ValidDigit, DigitResult = ConvertDigit(CurrRomDigit)
      if not ValidDigit:
         return(False, DigitResult)

      CurrRomPower, CurrRomIsFive = DigitResult
      if CurrRomPower > ArNumPower:
         if CurrRomIsFive or (CurrRomPower > ArNumPower + 1) or (ArDigit[ArNumPower] != 1):
            return(False, 'Bad input - invalid sequence: ' + RomNum[i-1:i+1])
         # valid sequences: IX, XC, CM
         ArDigit[ArNumPower] = 9
      else:
         ArNumPower = min(ArNumPower, CurrRomPower)
         if CurrRomIsFive:
            if ArDigit[ArNumPower] == 0:
               ArDigit[ArNumPower] = 5
            elif ArDigit[ArNumPower] == 1: # valid sequences: IV, XL, CD
               ArDigit[ArNumPower] = 4
            else:
               return(False, 'Bad input - invalid sequence: ' + RomNum[i-1:i+1])
         else:
            if ArDigit[ArNumPower] in (3,8):
               return(False, 'Bad input - too many consecutive units: ' + CurrRomDigit)
            elif ArDigit[ArNumPower] in (4,9):
               return(False, 'Bad input - invalid sequence: ' + RomNum[i-2:i+1])
            else:
               ArDigit[ArNumPower] += 1

   ArNum = 0
   for i in range(4):
      ArNum += ArDigit[i] * (10 ** i)
   return(True, ArNum)


def ConvertDigit(RomDigit):
   RomDigit = str(RomDigit).upper()
   if not RomDigit in 'IVXLCDM':
      Result = 'Bad input - invalid character: ' + RomDigit
      return (False, Result)

   IsFive = (RomDigit in 'VLD')

   if RomDigit in 'IV':
      ArPower = 0
   elif RomDigit in 'XL':
      ArPower = 1
   elif RomDigit in 'CD':
      ArPower = 2
   else:
      ArPower = 3

   Result = (ArPower, IsFive)
   return (True, Result)

Jubal

I'll have to copy that through and have a look at it. It may say something about my programming that I probably use python more than any other language and I never actually use if... in as a system despite it probably being quite effective for a lot of what I do...
The duke, the wanderer, the philosopher, the mariner, the warrior, the strategist, the storyteller, the wizard, the wayfarer...

Glaurung

#3
Thanks - I'll be interested to hear your feedback.


EDIT 20/07/21:
For your further delectation and delight, a variation on the previous logic, where I have structured the permitted Roman numerals as a dictionary:

RomDigits = {'I': (0, False), 'V': (0, True), 'X': (1, False), 'L': (1, True), 'C': (2, False), 'D': (2, True), 'M': (3, False)}

def ConvertNum(RomNum):
   RomNum = str(RomNum).upper()
   RomNumLen = len(RomNum)
   if RomNumLen < 1:
      return(False, 'Bad input - empty string')
   if RomNumLen > 15:   # longest valid string is MMMDCCCLXXXVIII = 3888
      return(False, 'Bad input - string too long: ' + RomNum)

   ArDigit = [0,0,0,0]
   ArNumPower = 4
   for i in range(RomNumLen):
      DigitValid, ConvResult = ConvertDigit(RomNum[i])
      if not DigitValid:
         return(False, ConvResult)

      CurrRomPower, CurrRomIsFive = ConvResult
      if CurrRomPower > ArNumPower:
         if CurrRomIsFive or (CurrRomPower > ArNumPower + 1) or (ArDigit[ArNumPower] != 1):
            return(False, 'Bad input - invalid sequence: ' + RomNum[i-1:i+1])
         # valid sequences: IX, XC, CM
         ArDigit[ArNumPower] = 9
      else:
         ArNumPower = min(ArNumPower, CurrRomPower)
         if CurrRomIsFive:
            if ArDigit[ArNumPower] == 0:
               ArDigit[ArNumPower] = 5
            elif ArDigit[ArNumPower] == 1: # valid sequences: IV, XL, CD
               ArDigit[ArNumPower] = 4
            else:
               return(False, 'Bad input - invalid sequence: ' + RomNum[i-1:i+1])
         else:
            if ArDigit[ArNumPower] in (3,8):
               return(False, 'Bad input - too many consecutive units: ' + RomNum[i])
            elif ArDigit[ArNumPower] in (4,9):
               return(False, 'Bad input - invalid sequence: ' + RomNum[i-2:i+1])
            else:
               ArDigit[ArNumPower] += 1

   ArNum = 0
   for i in range(4):
      ArNum += ArDigit[i] * (10 ** i)
   return(True, ArNum)


def ConvertDigit(Digit):
   Digit = str(Digit).upper()
   IsValid = (Digit in RomDigits)
   if IsValid:
      ConvResult = RomDigits[Digit]
   else:
      ConvResult = 'Bad input - invalid character: ' + Digit
   return (IsValid, ConvResult)


This makes the ConvertDigit function a lot shorter and neater, but it feels somehow less satisfying, because much of the logic is now embedded in the dictionary definition rather than being explicit in the code.


EDIT 21/07/21: Well, that'll teach me - quite literally. A bit more reading in the Python book has brought me to the 'enumerate' function, which provides exactly the combination of "index within a string" and "character at that index" that I was looking for a few days ago. However, having tried it out, in this case I don't feel it makes the code any more elegant. I'll keep it in mind, though, in case I run into a more complex iterable and/or logic where it might be useful.