ChatGPT, Datasette-Extract, and the US Ham Radio General Exam Question Pool

By: noreply@blogger.com (antigrav_kids KD0FNR)

26 April 2024 at 13:02

I started a project, ahem, yesterday to 'quickly' see if ChatGPT could read the entire United States general class amateur radio exam question pool into a Datasette instance using the datasette-extract plugin. As of this morning, I haven't been able to coax ChatGPT using the gpt-4-turbo model. I capture my rather raw notes below. The short version of this is that I was never able to get the AI to capture more than 19 questions at a time. I'm hopeful that the pool could be moved into a database table using iterative processes, but for now, I've run out of time for this quick project :)

Occasionally ChatGPT seemed to hallucinate out part of its process into the table

Notes Follow

I'm going to track how easy it is to get the general exam question pool into a database using the Datasette Plugin. I started this endeavor at 20:37 UTC.

Get my already existent OpenAI API key ready to go

20:43: Done. As usual with OpenAI, the hardest part was finding login screens and then the API. Finally did a Google search to find the API.

Install the datasette-extract plugin

I've run into an issue here. I think I have too old of a version of Datasette and Windows can't figure out how to uninstall it

Using cached datasette_extract-0.1a6-py3-none-any.whl (815 kB)

Using cached datasette-1.0a13-py3-none-any.whl (302 kB)

Using cached datasette_secrets-0.1a4-py3-none-any.whl (12 kB)

Installing collected packages: datasette, datasette-secrets, datasette-extract

Attempting uninstall: datasette

Found existing installation: datasette 1.0a3

Uninstalling datasette-1.0a3:

ERROR: Could not install packages due to an OSError: [WinError 32] The process cannot access the file because it is being used by another process: 'c:\\users\\m3n7es\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\\localcache\\local-packages\\python311\\scripts\\datasette.exe'

Check the permissions.

I'll clone a dev environment for the plugin and then run in venv. Time now 21:00.

Still Installing

21:05 OK! pytest passes!

Adding Table Column Names

This is easy since I've already got a table for the general exam pool. The headings are:

id question class subelement group_index group_number answer answer_a answer_b answer_c answer_d

21:21 The column names have been defined with hints.

id primary key

question follows a line starting with G ends with '?'

class Defaults to G for every question

subelement A number following G before a second letter

group_index The letter following subelement's number (G)(\d)(A-Z)(\d\d) Use \$3

group_number two digit number following group_index (G)(\d)(A-Z)(\d\d) use \$4

answer A single letter between parentheses that indicates the correct answer choice

answer_a next line starting with 'A.'

answer_b next line starting with 'B.'

answer_c next line starting with 'C.'

answer_d next line starting with 'D.'

I added the additional instructions

The questions and answers are in line sorted by headings that contain class (always G), then subelement (a single digit following G), then group_index (a single letter following the subelement), then group_number (a question number within the group_index), then the single letter correct answer enclosed in parentheses. The next line contains the entire question text for the question field. The next four lines in each question contain the four possible answers. The end of each question is denoted by '~~'.

I've copied the entire question pool starting at

and ending at

into the tool. Now, I'll press 'Extract'

Time is 21:26 UTC

Extracting to Table

Got back this error message:

Error: Error code: 404 - {'error': {'message': 'The model `gpt-4-turbo` does not exist or you do not have access to it.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

Extraction failed

OK. Looking at My OpenAI account I see:

No gpt-4-turbo. So, that's a bit of a challenge.

OK! The API is like using a Clipper Card on BART. You have to pay up front

I put some money in the account.

I'll try to extract again. It's 21:43.

It's Working!!!

[
  {
    "id": 1,
    "question": "On which HF and/or MF amateur bands are there portions where General class licensees cannot transmit?",
    "class": "G",
    "subelement": "G1",
    "group_index": "A",
    "group_number": "01",
    "answer": "C",
    "answer_a": "60 meters, 30 meters, 17 meters, and 12 meters",
    "answer_b": "160 meters, 60 meters, 15 meters, and 12 meters",
    "answer_c": "80 meters, 40 meters, 20 meters, and 15 meters",
    "answer_d": "80 meters, 20 meters, 15 meters, and 10 meters"
  },
  {
    "id": 2,
    "question": "On which of the following bands is phone operation prohibited?",

The engine is still cranking along at 21:47.

And Then </exceeds>

  {
    "id": 19,
    "question": "When is it permissible to communicate with amateur stations in countries outside the areas administered by the Federal Communications Commission?",
    "class": "G",
    "subelement": "G1",
    "group_index": "B",
    "group_number": "08",
    "answer": "B",
    "answer_a": "Only when the foreign country has a formal third-party agreement filed with the FCC",
    "answer_b": "When the contact is with amateurs in any country except those whose administrations have notified the ITU that they object to such communications",
    "answer_c": "Only when the contact is with amateurs licensed by a country whic...  Click to expand ... <exceeds maximum number of characters> ,,groupId,,quizzes,,element,,data,,result,,direct,,[]}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]}</exceeds>}]}]},"
  }
]

Did I hit the end of my billing envelope?

21:51 No, billing seems fine. I wonder is I need to add the file in as a pdf because of this message:

exceeds maximum number of characters

Trying again with a pdf file

21:59 Dropping in a pdf fil resulted in 'Processing...' message for the last 8 minutes. Trying this a subelement (subelement_group? since it didn't complete a subelement) at a time.

Full Subelement at a time

Back up and running at 22:01.

Well, shucks, that time it only pulled out two questions. Also, it didn't create the table even though it said it did:

Error 404

Table not found: ham_exam_general_question_pool

I'll try a db that doesn't revolve around a memory table next.

No Memory Table DBs

What could have been really bothersome was a breeze. The table columns auto-populated for me!

'Additional instructions' was not auto-populated, so WooooHooooo!!! blogging. Meaning, I'm really happy I documented my instructions a few paragraphs back.

22:11 Pushed the 'Extract' button. Results started coming in a few seconds later.

Nuts! It got three questions out this time, but that's it! What's the difference in setups???

Adding Remaining SubElement Group by Hand

Starting at 22:22

22:24 That worked. The entire G1A subelement group is in the table.

Can it do two subelement groups?

22:26 Input subelement group B and C

22:27 Both subelement groups have been successfully added.

The rest of the groups in the subelement?

Again that's two subelemetn groups, D and E, but it only pulled out one question: the last one in the C group that I accidentally copied back in. Nuts!

Removed the row, removed the input, trying again at 22:32

Made it through the D subelement group and then stopped on

"G1E – Control categories; repeater regulations; third-party rules; ITU regions; automatically controlled digital station"

I think I see the game. I'll take out the group descriptions and add all the text in to see if I can be deon with this. 23:34

Descriptions Removed

23:42 back up and running with all the descriptions removed. We'll see how this goes.

It's taking about four seconds per exam question to figure out the correct extraction.

After

 "id": "G1E12",

decided it was done

Remember how the ids started out as numbers? Weird.

Note: Updating the following morning. Not weird. I forgot to set the field type to integer.

More Instructions

22:49

Added these additional instructions:

"When the subelement changes, or the subelement group changeds, keep going please. The end of the question pool is deonted by '~~~end of question pool text~~~' You're doing a great job, but please get every additional question this time."

and trying again.

22:49 Three questions have come back. It seems to be thinking now?

22:50 (Yes, I know it's not actually thinking.)

22:51 Calling this. Still at three additional quesitons.

Don't give away the ending

I took away the instruction about how to find the end of the pool. as well as the line about 'every additional question'

22:54 Successfully crossed from G2A12 to G2B01

22:54 And now from G2B11 to G2C01

22:55 Stopped at G2C08. Why???

Did ChatGPT read the question? 'What prosign is sent to indicate the end of a formal message when using CW?'

22:59 Made the hop to G3A01 and then promptly decided it was done again.

There were two blank lines above that question rather than one. Is that why?

23:02 started it back up.

23:02 Stopped atain at G3A14.

Again, there are three blank lines after this question rather than one.

23:05 Added 'The number of blank lines between questions is NOT significant.' to the Additional instructions.

Stopped two questions later at G3B02.

23:06 Starting again.

Two questions again. Taking away the last instruction.

23:38 So Tired

Got this error a few rows in

After changing 'Additional instructions' to

"IGNORE ALL BLANK LINES in content. Extract all data from content according to the following instructions. Rows will always begin with the pattern (G)(\d)([A-J])(\d\d)(\s*)([A-D]) and end with a line containing '~~' The questions and answers are in line sorted by headings that contain class (always G), then subelement (a single digit following G), then group_index (a single letter following the subelement), then group_number (a question number within the group_index), then the single letter correct answer enclosed in parentheses. The next line contains the entire question text for the question field. The next four lines in each question contain the four possible answers. The end of each question is denoted by '~~'"

Let's flush the state and start over

Looking above, the plug-in did as well as it ever did before I tried all the above experiments. One thing I hadn't realized, (although I'd documented it), was that I accidentally changed the key to be text on my second try. I'm moving back to the original material copied in and the original instructinos with a numeric key.

First, I tried without a new key and wound up only getting two questions back. Just as bad as ever.

Changing all the fields with numbers to integer resulted in one question.

I'm going to create a new OpenAI key and start on a clean database.

New database, new key, new table name wound up with 13 questions on the first try. I don't think I'

Normal view