🐍 Python Client API Reference¶

Completion ¶

Bases: APIEngine

Completion API. This API is used to generate text completions.

Language models are trained to understand natural language and predict text outputs as a response to their inputs. The inputs are called prompts and the outputs are referred to as completions. LLMs take the input prompts and chunk them into smaller units called tokens to process and generate language. Tokens may include trailing spaces and even sub-words; this process is language dependent.

The Completion API can be run either synchronous or asynchronously (via Python asyncio). For each of these modes, you can also choose whether to stream token responses or not.

create `classmethod` ¶

create(
    model: str,
    prompt: str,
    max_new_tokens: int = 20,
    temperature: float = 0.2,
    stop_sequences: Optional[List[str]] = None,
    return_token_log_probs: Optional[bool] = False,
    timeout: int = 10,
    stream: bool = False,
) -> Union[
    CompletionSyncResponse,
    Iterator[CompletionStreamResponse],
]

Creates a completion for the provided prompt and parameters synchronously.

This API can be used to get the LLM to generate a completion synchronously. It takes as parameters the model (see Model Zoo) and the prompt. Optionally it takes max_new_tokens, temperature, timeout and stream. It returns a CompletionSyncResponse if stream=False or an async iterator of CompletionStreamResponse with request_id and outputs fields.

Parameters:

Name	Type	Description	Default
`model`	`str`	Name of the model to use. See Model Zoo for a list of Models that are supported.	required
`prompt`	`str`	The prompt to generate completions for, encoded as a string.	required
`max_new_tokens`	`int`	The maximum number of tokens to generate in the completion. The token count of your prompt plus `max_new_tokens` cannot exceed the model's context length. See Model Zoo for information on each supported model's context length.	`20`
`temperature`	`float`	What sampling temperature to use, in the range `[0, 1]`. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. When temperature is 0 greedy search is used.	`0.2`
`stop_sequences`	`Optional[List[str]]`	One or more sequences where the API will stop generating tokens for the current completion.	`None`
`return_token_log_probs`	`Optional[bool]`	Whether to return the log probabilities of generated tokens. When True, the response will include a list of tokens and their log probabilities.	`False`
`timeout`	`int`	Timeout in seconds. This is the maximum amount of time you are willing to wait for a response.	`10`
`stream`	`bool`	Whether to stream the response. If true, the return type is an `Iterator[CompletionStreamResponse]`. Otherwise, the return type is a `CompletionSyncResponse`. When streaming, tokens will be sent as data-only server-sent events.	`False`

Returns:

Name	Type	Description
`response`	`Union[CompletionSyncResponse, AsyncIterable[CompletionStreamResponse]]`	The generated response (if `stream=False`) or iterator of response chunks (if `stream=True`)

Synchronous completion without token streaming in PythonResponse in JSON

from llmengine import Completion

response = Completion.create(
    model="llama-2-7b",
    prompt="Hello, my name is",
    max_new_tokens=10,
    temperature=0.2,
)
print(response.json())

{
    "request_id": "8bbd0e83-f94c-465b-a12b-aabad45750a9",
    "output": {
        "text": "_______ and I am a _______",
        "num_completion_tokens": 10
    }
}

Token streaming can be used to reduce perceived latency for applications. Here is how applications can use streaming:

Synchronous completion with token streaming in PythonResponse in JSON

from llmengine import Completion

stream = Completion.create(
    model="llama-2-7b",
    prompt="why is the sky blue?",
    max_new_tokens=5,
    temperature=0.2,
    stream=True,
)

for response in stream:
    if response.output:
        print(response.json())

{"request_id": "ebbde00c-8c31-4c03-8306-24f37cd25fa2", "output": {"text": "\n", "finished": false, "num_completion_tokens": 1 } }
{"request_id": "ebbde00c-8c31-4c03-8306-24f37cd25fa2", "output": {"text": "I", "finished": false, "num_completion_tokens": 2 } }
{"request_id": "ebbde00c-8c31-4c03-8306-24f37cd25fa2", "output": {"text": " don", "finished": false, "num_completion_tokens": 3 } }
{"request_id": "ebbde00c-8c31-4c03-8306-24f37cd25fa2", "output": {"text": "’", "finished": false, "num_completion_tokens": 4 } }
{"request_id": "ebbde00c-8c31-4c03-8306-24f37cd25fa2", "output": {"text": "t", "finished": true, "num_completion_tokens": 5 } }

acreate `async` `classmethod` ¶

acreate(
    model: str,
    prompt: str,
    max_new_tokens: int = 20,
    temperature: float = 0.2,
    stop_sequences: Optional[List[str]] = None,
    return_token_log_probs: Optional[bool] = False,
    timeout: int = 10,
    stream: bool = False,
) -> Union[
    CompletionSyncResponse,
    AsyncIterable[CompletionStreamResponse],
]

Creates a completion for the provided prompt and parameters asynchronously (with asyncio).

This API can be used to get the LLM to generate a completion asynchronously. It takes as parameters the model (see Model Zoo) and the prompt. Optionally it takes max_new_tokens, temperature, timeout and stream. It returns a CompletionSyncResponse if stream=False or an async iterator of CompletionStreamResponse with request_id and outputs fields.

Parameters:

Name	Type	Description	Default
`model`	`str`	Name of the model to use. See Model Zoo for a list of Models that are supported.	required
`prompt`	`str`	The prompt to generate completions for, encoded as a string.	required
`max_new_tokens`	`int`	The maximum number of tokens to generate in the completion. The token count of your prompt plus `max_new_tokens` cannot exceed the model's context length. See Model Zoo for information on each supported model's context length.	`20`
`temperature`	`float`	What sampling temperature to use, in the range `[0, 1]`. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. When temperature is 0 greedy search is used.	`0.2`
`stop_sequences`	`Optional[List[str]]`	One or more sequences where the API will stop generating tokens for the current completion.	`None`
`return_token_log_probs`	`Optional[bool]`	Whether to return the log probabilities of generated tokens. When True, the response will include a list of tokens and their log probabilities.	`False`
`timeout`	`int`	Timeout in seconds. This is the maximum amount of time you are willing to wait for a response.	`10`
`stream`	`bool`	Whether to stream the response. If true, the return type is an `Iterator[CompletionStreamResponse]`. Otherwise, the return type is a `CompletionSyncResponse`. When streaming, tokens will be sent as data-only server-sent events.	`False`

Returns:

Name	Type	Description
`response`	`Union[CompletionSyncResponse, AsyncIterable[CompletionStreamResponse]]`	The generated response (if `stream=False`) or iterator of response chunks (if `stream=True`)

Asynchronous completion without token streaming in PythonResponse in JSON

import asyncio
from llmengine import Completion

async def main():
    response = await Completion.acreate(
        model="llama-2-7b",
        prompt="Hello, my name is",
        max_new_tokens=10,
        temperature=0.2,
    )
    print(response.json())

asyncio.run(main())

{
    "request_id": "9cfe4d5a-f86f-4094-a935-87f871d90ec0",
    "output": {
        "text": "_______ and I am a _______",
        "num_completion_tokens": 10
    }
}

Token streaming can be used to reduce perceived latency for applications. Here is how applications can use streaming:

Asynchronous completion with token streaming in PythonResponse in JSON

import asyncio
from llmengine import Completion

async def main():
    stream = await Completion.acreate(
        model="llama-2-7b",
        prompt="why is the sky blue?",
        max_new_tokens=5,
        temperature=0.2,
        stream=True,
    )

    async for response in stream:
        if response.output:
            print(response.json())

asyncio.run(main())

{"request_id": "9cfe4d5a-f86f-4094-a935-87f871d90ec0", "output": {"text": "\n", "finished": false, "num_completion_tokens": 1}}
{"request_id": "9cfe4d5a-f86f-4094-a935-87f871d90ec0", "output": {"text": "I", "finished": false, "num_completion_tokens": 2}}
{"request_id": "9cfe4d5a-f86f-4094-a935-87f871d90ec0", "output": {"text": " think", "finished": false, "num_completion_tokens": 3}}
{"request_id": "9cfe4d5a-f86f-4094-a935-87f871d90ec0", "output": {"text": " the", "finished": false, "num_completion_tokens": 4}}
{"request_id": "9cfe4d5a-f86f-4094-a935-87f871d90ec0", "output": {"text": " sky", "finished": true, "num_completion_tokens": 5}}

FineTune ¶

Bases: APIEngine

FineTune API. This API is used to fine-tune models.

Fine-tuning is a process where the LLM is further trained on a task-specific dataset, allowing the model to adjust its parameters to better align with the task at hand. Fine-tuning is a supervised training phase, where prompt/response pairs are provided to optimize the performance of the LLM. LLM Engine currently uses LoRA for fine-tuning. Support for additional fine-tuning methods is upcoming.

LLM Engine provides APIs to create fine-tunes on a base model with training & validation datasets. APIs are also provided to list, cancel and retrieve fine-tuning jobs.

Creating a fine-tune will end with the creation of a Model, which you can view using Model.get(model_name) or delete using Model.delete(model_name).

create `classmethod` ¶

create(
    model: str,
    training_file: str,
    validation_file: Optional[str] = None,
    hyperparameters: Optional[
        Dict[str, Union[str, int, float]]
    ] = None,
    wandb_config: Optional[Dict[str, Any]] = None,
    suffix: Optional[str] = None,
) -> CreateFineTuneResponse

Creates a job that fine-tunes a specified model with a given dataset.

This API can be used to fine-tune a model. The model is the name of base model (Model Zoo for available models) to fine-tune. The training and validation files should consist of prompt and response pairs. training_file and validation_file must be publicly accessible HTTP or HTTPS URLs to a CSV file that includes two columns: prompt and response. A maximum of 100,000 rows of data is currently supported. At least 200 rows of data is recommended to start to see benefits from fine-tuning. For sequences longer than the native max_seq_length of the model, the sequences will be truncated.

A fine-tuning job can take roughly 30 minutes for a small dataset (~200 rows) and several hours for larger ones.

Parameters:

Name	Type	Description	Default
`model`	`str`	The name of the base model to fine-tune. See Model Zoo for the list of available models to fine-tune.	required
`training_file`	`str`	Publicly accessible URL to a CSV file for training. When no validation_file is provided, one will automatically be created using a 10% split of the training_file data.	required
`validation_file`	`Optional[str]`	Publicly accessible URL to a CSV file for validation. The validation file is used to compute metrics which let LLM Engine pick the best fine-tuned checkpoint, which will be used for inference when fine-tuning is complete.	`None`
`hyperparameters`	`Optional[Dict[str, str]]`	A dict of hyperparameters to customize fine-tuning behavior. Currently supported hyperparameters: `lr`: Peak learning rate used during fine-tuning. It decays with a cosine schedule afterward. (Default: 2e-3) `warmup_ratio`: Ratio of training steps used for learning rate warmup. (Default: 0.03) `epochs`: Number of fine-tuning epochs. This should be less than 20. (Default: 5) `weight_decay`: Regularization penalty applied to learned weights. (Default: 0.001)	`None`
`wandb_config`	`Optional[Dict[str, Any]]`	A dict of configuration parameters for Weights & Biases. See Weights & Biases for more information. Set `hyperparameter["report_to"]` to `wandb` to enable automatic finetune metrics logging. Must include `api_key` field which is the wandb API key. Also supports setting `base_url` to use a custom Weights & Biases server.	`None`
`suffix`	`Optional[str]`	A string that will be added to your fine-tuned model name. If present, the entire fine-tuned model name will be formatted like `"[model].[suffix].[YYMMDD-HHMMSS]"`. If absent, the fine-tuned model name will be formatted `"[model].[YYMMDD-HHMMSS]"`. For example, if `suffix` is `"my-experiment"`, the fine-tuned model name could be `"llama-2-7b.my-experiment.230717-230150"`. Note: `suffix` must be between 1 and 28 characters long, and can only contain alphanumeric characters and hyphens.	`None`

Returns:

Name	Type	Description
`CreateFineTuneResponse`	`CreateFineTuneResponse`	an object that contains the ID of the created fine-tuning job

Here is an example script to create a 5-row CSV of properly formatted data for fine-tuning an airline question answering bot:

Formatting data in Python

import csv

# Define data
data = [
  ("What is your policy on carry-on luggage?", "Our policy allows each passenger to bring one piece of carry-on luggage and one personal item such as a purse or briefcase. The maximum size for carry-on luggage is 22 x 14 x 9 inches."),
  ("How can I change my flight?", "You can change your flight through our website or mobile app. Go to 'Manage my booking' section, enter your booking reference and last name, then follow the prompts to change your flight."),
  ("What meals are available on my flight?", "We offer a variety of meals depending on the flight's duration and route. These can range from snacks and light refreshments to full-course meals on long-haul flights. Specific meal options can be viewed during the booking process."),
  ("How early should I arrive at the airport before my flight?", "We recommend arriving at least two hours before domestic flights and three hours before international flights."),
  "Can I select my seat in advance?", "Yes, you can select your seat during the booking process or afterwards via the 'Manage my booking' section on our website or mobile app."),
  ]

# Write data to a CSV file
with open('customer_service_data.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["prompt", "response"])
    writer.writerows(data)

Currently, data needs to be uploaded to a publicly accessible web URL so that it can be read for fine-tuning. Publicly accessible HTTP and HTTPS URLs are currently supported. Support for privately sharing data with the LLM Engine API is coming shortly. For quick iteration, you can look into tools like Pastebin or GitHub Gists to quickly host your CSV files in a public manner. An example Github Gist can be found here. To use the gist, you can use the URL given when you click the “Raw” button (URL).

Example code for fine-tuning:

Fine-tuning in PythonResponse in JSON

from llmengine import FineTune

response = FineTune.create(
    model="llama-2-7b",
    training_file="https://my-bucket.s3.us-west-2.amazonaws.com/path/to/training-file.csv",
)

print(response.json())

{
    "fine_tune_id": "ft-cir3eevt71r003ks6il0"
}

get `classmethod` ¶

get(fine_tune_id: str) -> GetFineTuneResponse

Get status of a fine-tuning job.

This API can be used to get the status of an already running fine-tuning job. It takes as a single parameter the fine_tune_id and returns a GetFineTuneResponse object with the id and status (PENDING, STARTED, UNDEFINED, FAILURE or SUCCESS).

Parameters:

Name	Type	Description	Default
`fine_tune_id`	`str`	ID of the fine-tuning job	required

Returns:

Name	Type	Description
`GetFineTuneResponse`	`GetFineTuneResponse`	an object that contains the ID and status of the requested job

Getting status of fine-tuning in PythonResponse in JSON

from llmengine import FineTune

response = FineTune.get(
    fine_tune_id="ft-cir3eevt71r003ks6il0",
)

print(response.json())

{
    "fine_tune_id": "ft-cir3eevt71r003ks6il0",
    "status": "STARTED"
}

get_events `classmethod` ¶

get_events(fine_tune_id: str) -> GetFineTuneEventsResponse

Get events of a fine-tuning job.

This API can be used to get the list of detailed events for a fine-tuning job. It takes the fine_tune_id as a parameter and returns a response object which has a list of events that has happened for the fine-tuning job. Two events are logged periodically: an evaluation of the training loss, and an evaluation of the eval loss. This API will return all events for the fine-tuning job.

Parameters:

Name	Type	Description	Default
`fine_tune_id`	`str`	ID of the fine-tuning job	required

Returns:

Name	Type	Description
`GetFineTuneEventsResponse`	`GetFineTuneEventsResponse`	an object that contains the list of events for the fine-tuning job

Getting events for fine-tuning jobs in PythonResponse in JSON

from llmengine import FineTune

response = FineTune.get_events(fine_tune_id="ft-cir3eevt71r003ks6il0")
print(response.json())

{
    "events":
    [
        {
            "timestamp": 1689665099.6704428,
            "message": "{'loss': 2.108, 'learning_rate': 0.002, 'epoch': 0.7}",
            "level": "info"
        },
        {
            "timestamp": 1689665100.1966307,
            "message": "{'eval_loss': 1.67730712890625, 'eval_runtime': 0.2023, 'eval_samples_per_second': 24.717, 'eval_steps_per_second': 4.943, 'epoch': 0.7}",
            "level": "info"
        },
        {
            "timestamp": 1689665105.6544185,
            "message": "{'loss': 1.8961, 'learning_rate': 0.0017071067811865474, 'epoch': 1.39}",
            "level": "info"
        },
        {
            "timestamp": 1689665106.159139,
            "message": "{'eval_loss': 1.513688564300537, 'eval_runtime': 0.2025, 'eval_samples_per_second': 24.696, 'eval_steps_per_second': 4.939, 'epoch': 1.39}",
            "level": "info"
        }
    ]
}

list `classmethod` ¶

list() -> ListFineTunesResponse

List fine-tuning jobs.

This API can be used to list all the fine-tuning jobs. It returns a list of pairs of fine_tune_id and status for all existing jobs.

Returns:

Name	Type	Description
`ListFineTunesResponse`	`ListFineTunesResponse`	an object that contains a list of all fine-tuning jobs and their statuses

Listing fine-tuning jobs in PythonResponse in JSON

from llmengine import FineTune

response = FineTune.list()
print(response.json())

{
    "jobs": [
        {
            "fine_tune_id": "ft-cir3eevt71r003ks6il0",
            "status": "STARTED"
        },
        {
            "fine_tune_id": "ft_def456",
            "status": "SUCCESS"
        }
    ]
}

cancel `classmethod` ¶

cancel(fine_tune_id: str) -> CancelFineTuneResponse

Cancel a fine-tuning job.

This API can be used to cancel an existing fine-tuning job if it's no longer required. It takes the fine_tune_id as a parameter and returns a response object which has a success field confirming if the cancellation was successful.

Parameters:

Name	Type	Description	Default
`fine_tune_id`	`str`	ID of the fine-tuning job	required

Returns:

Name	Type	Description
`CancelFineTuneResponse`	`CancelFineTuneResponse`	an object that contains whether the cancellation was successful

Cancelling fine-tuning job in PythonResponse in JSON

from llmengine import FineTune

response = FineTune.cancel(fine_tune_id="ft-cir3eevt71r003ks6il0")
print(response.json())

{
    "success": true
}

Model ¶

Bases: APIEngine

Model API. This API is used to get, list, and delete models. Models include both base models built into LLM Engine, and fine-tuned models that you create through the FineTune.create() API.

See Model Zoo for the list of publicly available base models.

get `classmethod` ¶

get(model: str) -> GetLLMEndpointResponse

Get information about an LLM model.

This API can be used to get information about a Model's source and inference framework. For self-hosted users, it returns additional information about number of shards, quantization, infra settings, etc. The function takes as a single parameter the name model and returns a GetLLMEndpointResponse object.

Parameters:

Name	Type	Description	Default
`model`	`str`	Name of the model	required

Returns:

Name	Type	Description
`GetLLMEndpointResponse`	`GetLLMEndpointResponse`	object representing the LLM and configurations

Accessing model in PythonResponse in JSON

from llmengine import Model

response = Model.get("llama-2-7b.suffix.2023-07-18-12-00-00")

print(response.json())

{
    "id": null,
    "name": "llama-2-7b.suffix.2023-07-18-12-00-00",
    "model_name": null,
    "source": "hugging_face",
    "status": "READY",
    "inference_framework": "text_generation_inference",
    "inference_framework_tag": null,
    "num_shards": null,
    "quantize": null,
    "spec": null
}

list `classmethod` ¶

list() -> ListLLMEndpointsResponse

List LLM models available to call inference on.

This API can be used to list all available models, including both publicly available models and user-created fine-tuned models. It returns a list of GetLLMEndpointResponse objects for all models. The most important field is the model name.

Returns:

Name	Type	Description
`ListLLMEndpointsResponse`	`ListLLMEndpointsResponse`	list of models

Listing available modes in PythonResponse in JSON

from llmengine import Model

response = Model.list()
print(response.json())

{
    "model_endpoints": [
        {
            "id": null,
            "name": "llama-2-7b.suffix.2023-07-18-12-00-00",
            "model_name": null,
            "source": "hugging_face",
            "inference_framework": "text_generation_inference",
            "inference_framework_tag": null,
            "num_shards": null,
            "quantize": null,
            "spec": null
        },
        {
            "id": null,
            "name": "llama-2-7b",
            "model_name": null,
            "source": "hugging_face",
            "inference_framework": "text_generation_inference",
            "inference_framework_tag": null,
            "num_shards": null,
            "quantize": null,
            "spec": null
        },
        {
            "id": null,
            "name": "llama-13b-deepspeed-sync",
            "model_name": null,
            "source": "hugging_face",
            "inference_framework": "deepspeed",
            "inference_framework_tag": null,
            "num_shards": null,
            "quantize": null,
            "spec": null
        },
        {
            "id": null,
            "name": "falcon-40b",
            "model_name": null,
            "source": "hugging_face",
            "inference_framework": "text_generation_inference",
            "inference_framework_tag": null,
            "num_shards": null,
            "quantize": null,
            "spec": null
        }
    ]
}

delete `classmethod` ¶

delete(model: str) -> DeleteLLMEndpointResponse

Deletes an LLM model.

This API can be used to delete a fine-tuned model. It takes as parameter the name of the model and returns a response object which has a deleted field confirming if the deletion was successful. If called on a base model included with LLM Engine, an error will be thrown.

Parameters:

Name	Type	Description	Default
`model`	`str`	Name of the model	required

Returns:

Name	Type	Description
`response`	`DeleteLLMEndpointResponse`	whether the model was successfully deleted

Deleting model in PythonResponse in JSON

from llmengine import Model

response = Model.delete("llama-2-7b.suffix.2023-07-18-12-00-00")
print(response.json())

{
    "deleted": true
}

download `classmethod` ¶

download(
    model_name: str, download_format: str = "hugging_face"
) -> ModelDownloadResponse

Download a fine-tuned model.

This API can be used to download the resulting model from a fine-tuning job. It takes the model_name and download_format as parameter and returns a response object which contains a dictonary of filename, url pairs associated with the fine-tuned model. The user can then download these urls to obtain the fine-tuned model. If called on a nonexistent model, an error will be thrown.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	name of the fine-tuned model	required
`download_format`	`str`	download format requested (default=hugging_face)	`'hugging_face'`

Returns: DownloadModelResponse: an object that contains a dictionary of filenames, urls from which to download the model weights. The urls are presigned urls that grant temporary access and expire after an hour.

Downloading model in PythonResponse in JSON

from llmengine import Model

response = Model.download("llama-2-7b.suffix.2023-07-18-12-00-00", download_format="hugging_face")
print(response.json())

{
    "urls": {"my_model_file": "https://url-to-my-model-weights"}
}

File ¶

Bases: APIEngine

File API. This API is used to upload private files to LLM engine so that fine-tunes can access them for training and validation data.

Functions are provided to upload, get, list, and delete files, as well as to get the contents of a file.

upload `classmethod` ¶

upload(file: BufferedReader) -> UploadFileResponse

Uploads a file to LLM engine.

Parameters:

Name	Type	Description	Default
`file`	`BufferedReader`	A file opened with open(file_path, "r")	required

Returns:

Name	Type	Description
`UploadFileResponse`	`UploadFileResponse`	an object that contains the ID of the uploaded file

Uploading file in PythonResponse in JSON

from llmengine import File

response = File.upload(open("training_dataset.csv", "r"))

print(response.json())

{
    "id": "file-abc123"
}

get `classmethod` ¶

get(file_id: str) -> GetFileResponse

Get file metadata, including filename and size.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	ID of the file	required

Returns:

Name	Type	Description
`GetFileResponse`	`GetFileResponse`	an object that contains the ID, filename, and size of the requested file

Getting metadata about file in PythonResponse in JSON

from llmengine import File

response = File.get(
    file_id="file-abc123",
)

print(response.json())

{
    "id": "file-abc123",
    "filename": "training_dataset.csv",
    "size": 100
}

download `classmethod` ¶

download(file_id: str) -> GetFileContentResponse

Get contents of a file, as a string. (If the uploaded file is in binary, a string encoding will be returned.)

Parameters:

Name	Type	Description	Default
`file_id`	`str`	ID of the file	required

Returns:

Name	Type	Description
`GetFileContentResponse`	`GetFileContentResponse`	an object that contains the ID and content of the file

Getting file content in PythonResponse in JSON

from llmengine import File

response = File.download(file_id="file-abc123")
print(response.json())

{
    "id": "file-abc123",
    "content": "Hello world!"
}

list `classmethod` ¶

list() -> ListFilesResponse

List metadata about all files, e.g. their filenames and sizes.

Returns:

Name	Type	Description
`ListFilesResponse`	`ListFilesResponse`	an object that contains a list of all files and their filenames and sizes

Listing files in PythonResponse in JSON

from llmengine import File

response = File.list()
print(response.json())

{
    "files": [
        {
            "id": "file-abc123",
            "filename": "training_dataset.csv",
            "size": 100
        },
        {
            "id": "file-def456",
            "filename": "validation_dataset.csv",
            "size": 50
        }
    ]
}

delete `classmethod` ¶

delete(file_id: str) -> DeleteFileResponse

Deletes a file.

Parameters:

Name	Type	Description	Default
`file_id`	`str`	ID of the file	required

Returns:

Name	Type	Description
`DeleteFileResponse`	`DeleteFileResponse`	an object that contains whether the deletion was successful

Deleting file in PythonResponse in JSON

from llmengine import File

response = File.delete(file_id="file-abc123")
print(response.json())

{
    "deleted": true
}

🐍 Python Client API Reference¶

Completion ¶

create classmethod ¶

acreate async classmethod ¶

FineTune ¶

create classmethod ¶

get classmethod ¶

get_events classmethod ¶

list classmethod ¶

cancel classmethod ¶

Model ¶

get classmethod ¶

list classmethod ¶

delete classmethod ¶

download classmethod ¶

File ¶

upload classmethod ¶

get classmethod ¶

download classmethod ¶

list classmethod ¶

delete classmethod ¶

create `classmethod` ¶

acreate `async` `classmethod` ¶

create `classmethod` ¶

get `classmethod` ¶

get_events `classmethod` ¶

list `classmethod` ¶

cancel `classmethod` ¶

get `classmethod` ¶

list `classmethod` ¶

delete `classmethod` ¶

download `classmethod` ¶

upload `classmethod` ¶

get `classmethod` ¶

download `classmethod` ¶

list `classmethod` ¶

delete `classmethod` ¶