Let's build Text-to-ML: an AutoML library that operates natural language
Like a combination of HuggingGPT, LangChain and Python type inference
Hello frens
I did it about a year ago to a full stack AI chatbot named Gdańsk AI and I did it today to Text-to-ML. I built a thing to make some extra money and then I open source it 🤷
Let’s take this opportunity to go through a case study of building a small AutoML (so automated machine learning) tool together
Essentially, what you will build is a Python library which expects user to provide a question or a description in English language and returns an answer in one of Python types - boolean
, str
or int
, depending on what kind of question or description user provided
Under the hood, the pipeline starts with sending a request to LLM to analyze the user’s text, in order to retrieve the most important features of a request.
client.chat.completions.create(
model="gpt-3.5-turbo-0125",
messages=[{"role": "user", "content": text}],
tools=tools_run,
)
Those features are defined inside tools parameter (fka function calling)
tools_run = [
{
"type": "function",
"function": {
"name": "tools_run",
"description": "User writes what they want to know or achieve or ask about. Choose what type of task should machine learning model perform, based on user's description of a model or user's description of what user expects from a model. Choose what type of return value should it be, based on user query.",
"parameters": {
"type": "object",
"properties": {
"task_type": {
"type": "string",
"enum": [
"audio_classification",
"audio_to_audio",
"automatic_speech_recognition",
"text_to_speech",
"image_classification",
"image_segmentation",
"image_to_image",
"image_to_text",
"object_detection",
"text_to_image",
"zero_shot_image_classification",
"document_question_answering",
"visual_question_answering",
"conversational",
"feature_extraction",
"fill_mask",
"question_answering",
"sentence_similarity",
"summarization",
"table_question_answering",
"text_classification",
"text_generation",
"token_classification",
"translation",
"zero_shot_classification",
"tabular_classification",
"tabular_regression",
],
"description": "Type of task should machine learning model perform, based on user's description of a model or user's description of what user expects from a model.",
},
"return_type": {
"type": "string",
"enum": ["boolean", "string", "number", "image", "audio"],
"description": "Decide what type of value corresponds the most to the output from the user's query.",
},
"answer": {
"type": "string",
"description": "If you are confident that able to fulfill user request, so it's not needed to create a ML model that is able to fulfill the request, put your answer into field 'answer'. If you're not sure, leave this field empty.",
},
},
"required": ["task_type", "return_type"],
},
},
},
]
This way, I expect LLM to provide me what task type matches the user’s query the best (task_type
) and what should be a return type of a response to the user’s query (return_type
). A list of available tasks come from Hugging Face, because that’s the platform we will utilize to search for the most suitable model. answer
is an optional field, which I expect to be returned only if the LLM itself is able to provide a reliable response to user’s query
The heuristic for searching the most suitable model is pretty simple and therefore it’s prone to be fallible. You can think of better ways for picking the right model and share it with me - or even better - fork the repository and contribute to Text-to-ML codebase
inference_client = InferenceClient()
method_to_call = getattr(
inference_client, task_type, None
)
if callable(method_to_call):
call_key = data_to_inference_client_call_property(task_type)
if call_key in ["audio", "image"]:
file = await file[0].read()
result = method_to_call(
**{
data_to_inference_client_call_property(task_type): file,
}
)
Let’s break this down
It’s a hacky implementation of Hugging Face Inference Endpoints call with InferenceClient
Since task_type
matches exactly the tasks in InferenceClient, we can call the method that is named just like our task_type
. So, we’d like to do something like inference_client[task_type]()
.
Yet, this syntax doesn’t work in Python. Instead, we can call getattr
, providing an object we want to retrieve an attribute from (inference_client
) and the name of the attribute, which is a str
value stored in task_type
in our case
Then we check with callable(method_to_call)
whether such method exists in inference_client
. If it does, then we can go to the next step, which is building the parameters that our inference client method expects. These parameters differ between different types of tasks, so we can create a helper function that picks a right key
def data_to_inference_client_call_property(task_type):
task_to_field = {
"audio_classification": "audio",
"audio_to_audio": "audio",
"automatic_speech_recognition": "audio",
"text_to_speech": "text",
"image_classification": "image",
"image_segmentation": "image",
"image_to_image": "image",
"image_to_text": "image",
"object_detection": "image",
"text_to_image": "text",
"zero_shot_image_classification": "image",
"document_question_answering": "document",
"visual_question_answering": "image",
"conversational": "text",
"feature_extraction": "text",
"fill_mask": "text",
"question_answering": "question",
"sentence_similarity": "text",
"summarization": "text",
"table_question_answering": "table",
"text_classification": "text",
"text_generation": "inputs",
"token_classification": "text",
"translation": "text",
"zero_shot_classification": "text",
"tabular_classification": "data",
"tabular_regression": "data",
}
field = task_to_field.get(task_type, None)
if field is not None:
return field
else:
raise ValueError("Unsupported task type")
Once again, this dictionary comes from mapping Hugging Face documentation to our Python code
A value returned by data_to_inference_client_call_property
serves as a key when calling inference endpoint and the value is a file
, provided by the user (if any)
Okay. We’re ready to call the Hugging Face Inference API and run the model
result = method_to_call(
**{
data_to_inference_client_call_property(task_type): file,
}
)
We get some output, but since we support many different tasks, we have to handle the variety of possible responses in a smart way
We can once again delegate the thinking part to LLM, by doing some basic prompt engineering and including the Hugging Face Inference Endpoint response, user’s original query and an expected return type
final_response = client.chat.completions.create(
model="gpt-3.5-turbo-0125",
messages=[
{
"role": "user",
"content": f"Given an API response to the user's request, synthesize a response to the original user's request taking the API response into account in ordert to respond with a type, specified below. Don't include anything in your response, besides a single most important fact from response in a specified type. If possible, respond with a single word. If not, then limit amount of words as much as possible, to not include any word that is not 100% necessary to provide the response. If a question is about numbers, respond with a numeric value only. API response: '{result}'. User's request: '{text}''. Response type: '{return_type}'",
}
],
)
Good work, I guess. The code we just wrote allows us to perform multiple types of tasks and if we wrap the output to be parsed to Python types, we can use our freshly created code as a regular functions in Python apps. So let’s do it!
def ai(query, data=None):
payload = {"text": query}
files = {}
if isinstance(data, str) and (
data.startswith("http://") or data.startswith("https://")
):
response = requests.get(data)
if response.status_code == 200:
img = Image.open(io.BytesIO(response.content))
buf = io.BytesIO()
img.save(buf, format="PNG")
buf.seek(0)
files["file"] = (
"image.png",
buf,
"image/png",
)
else:
return
elif isinstance(data, str):
payload["payload_text"] = data
elif isinstance(data, (int, float)):
payload["payload_text"] = str(data)
elif isinstance(data, Image.Image):
buf = io.BytesIO()
image_format = data.format if data.format else "PNG"
format_extension = image_format.lower()
if format_extension == "jpeg":
format_extension = "jpg"
mime_type = f"image/{format_extension}"
data.save(buf, format=image_format)
buf.seek(0)
files["file"] = (
f"image.{format_extension}",
buf,
mime_type,
)
elif data is not None:
if os.path.isfile(data):
mime_type = "application/octet-stream"
if data.endswith(".mp3"):
mime_type = "audio/mpeg"
elif data.endswith(".mp4"):
mime_type = "video/mp4"
elif data.endswith(".wav"):
mime_type = "audio/wav"
files["file"] = (os.path.basename(data), open(data, "rb"), mime_type)
else:
return
response = requests.post(
endpoint(os.environ["API_URL"], "run"),
data=payload,
files=files,
)
response_json = response.json()
if response_json == "true":
return True
if response_json == "false":
return False
try:
response_as_int = int(response_json)
return response_as_int
except ValueError as e:
noop()
return response_json
I decided to copy-paste the whole function, because it’s quite straightforward once you see it
At first, we parse an optional parameter called data, which can hold URLs to files, text, numbers or files. Once we have it, we can run the code we just wrote before - here I have exposed the code as POST endpoint using FastAPI
Finally, when we get a response from our code, we can parse it to a proper Python type. We match the received text to either a boolean
value, then we try to build an int
out of it. If that fails, we just return a received value, because it means that it’s either a text or a file
And this is it. We can now run experiments on our code - see experiments.py and add your own ones.
Sample invocation:
ai("is there a dog on the image?", image)
And it will return a boolean type with True value, if image contains a dog
This is great, because now you can utilize a power of AI with literally zero configuration
Of course, the project is really basic and it can be much improved. Once again, feel welcome to contribute :)
Thanks for reading!