Product

Select Language

Product

Select Language

SiliconCloud, Production Ready
Cloud with Low Cost

Teaming up with excellent open-source foundation models.

Models→

Playground →

Product

Select Language

SiliconCloud, Production Ready
Cloud with Low Cost

Teaming up with excellent open-source foundation models.

Models→

Playground →

Top-quality model services

01.

Chat

SiliconCloud delivers efficient, user-friendly, and scalable LLM models, with an out-of-the-box inference acceleration capability, including Qwen、DeepSeek、 GLM4, etc.

Get Started→

02.

Image

SiliconCloud encompasses a diverse range of text-to-image and text-to-video models, such as Kolors.

Get Started→

03.

SiliconCloud also offers other efficient and feature-rich model categories, including embedding, reranker, voice, and video generation models.

Get Started→

01.

Chat

SiliconCloud delivers efficient, user-friendly, and scalable LLM models, with an out-of-the-box inference acceleration capability, including Qwen、DeepSeek、 GLM4, etc.

Get Started→

02.

Image

SiliconCloud encompasses a diverse range of text-to-image and text-to-video models, such as Kolors.

Get Started→

03.

SiliconCloud also offers other efficient and feature-rich model categories, including embedding, reranker, voice, and video generation models.

Get Started→

Cloud inference services based on excellent open-source models.

MaaS

Enterprise-level all-scenario model service

MaaS

Cloud inference services based on excellent open-source models.

01.

Chat

SiliconCloud delivers efficient, user-friendly, and scalable LLM models, with an out-of-the-box inference acceleration capability, including Qwen、DeepSeek、 GLM4, etc.

Get Started→

02.

Image

03.

01.

Chat

SiliconCloud delivers efficient, user-friendly, and scalable LLM models, with an out-of-the-box inference acceleration capability, including Qwen、DeepSeek、 GLM4, etc.

Get Started→

02.

Image

03.

Model Fine-Tune and Deploying

designed for large-scale model fine-tuning and deploying. Through the platform, users can quickly and seamlessly deploy custom models as services and fine-tune them based on the data uploaded.

Data Upload

Build a suitable dataset and upload it for creating fine-tuning jobs. The data set consists of a single JSONL file, where each line is a separate training data.

Step.01→

Fine-tuning

Select the appropriate dataset and adjust the relevant parameters to improve the model effect and meet the customization needs.

Step.02→

Effect Evaluation

Upload the evaluation dataset. Evaluate the effect of the trained model, and choose tge best one for deployment.

Step.03→

Model Deploying

Deploy the fine-tuned model on the cloud platform and call it through APIs.

Step.04

One-Stop: From Fine-Tune to Deploying

Data Upload

Build a suitable dataset and upload it for creating fine-tuning jobs. The data set consists of a single JSONL file, where each line is a separate training data.

Step.01→

Fine-tuning

Select the appropriate dataset and adjust the relevant parameters to improve the model effect and meet the customization needs.

Step.02→

Effect Evaluation

Upload the evaluation dataset. Evaluate the effect of the trained model, and choose tge best one for deployment.

Step.03→

Model Deploying

Deploy the fine-tuned model on the cloud platform and call it through APIs.

Step.04

One-Stop: From Fine-Tune to Deploying

Data Upload

Build a suitable dataset and upload it for creating fine-tuning jobs. The data set consists of a single JSONL file, where each line is a separate training data.

Step.01→

Fine-tuning

Select the appropriate dataset and adjust the relevant parameters to improve the model effect and meet the customization needs.

Step.02→

Effect Evaluation

Upload the evaluation dataset. Evaluate the effect of the trained model, and choose tge best one for deployment.

Step.03→

Model Deploying

Deploy the fine-tuned model on the cloud platform and call it through APIs.

Step.04

High performance, flexible, and ease to use

Blazing fsat model inference

Time latency of LLM is reduced by up to 2.7 times

Max Concurrent Requsets

Speed of text to image is increased by 3 times

Image 1024*1024, batch size, steps 30, on A100 80GB SXM4

End2End Time (sec)

Auto-scaling on demand

Create an auto-scaling group which contains a collection of SiliconCloud instances.

Specify minimum and maximum numbers of instance in that group.

Specify desired capacity and auto-scaling policies.

Created successfully. The platform will automatically scales the service on demand.

Specify minimum and maximum numbers of instance in that group.

Created successfully. The platform will automatically scales the service on demand.

Easy to use

from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY", base_url="https://api.siliconflow.cn/v1")

response = client.chat.completions.create(

model='deepseek-ai/DeepSeek-V2.5',

messages=[

{'role': 'user',

'content': "SiliconCloud推出分层速率方案与免费模型RPM提升10倍，对于整个大模型应用领域带来哪些改变？"}

],

stream=True

)

for chunk in response:

print(chunk.choices[0].delta.content, end='')

Model Inference

With just a few lines of code, developers can quickly use SiliconCloud's rapid mockup service.

Model Deploy

Upload your workflow and Download the callable Model Service API.

Reduce the chances of application downtime with auto scaling.

Accelerate your workflow as needed.

from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY", base_url="https://api.siliconflow.cn/v1")

response = client.chat.completions.create(

model='deepseek-ai/DeepSeek-V2.5',

messages=[

{'role': 'user',

'content': "SiliconCloud推出分层速率方案与免费模型RPM提升10倍，对于整个大模型应用领域带来哪些改变？"}

],

stream=True

)

for chunk in response:

print(chunk.choices[0].delta.content, end='')

from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY", base_url="https://api.siliconflow.cn/v1")

response = client.chat.completions.create(

model='deepseek-ai/DeepSeek-V2.5',

messages=[

{'role': 'user',

'content': "SiliconCloud推出分层速率方案与免费模型RPM提升10倍，对于整个大模型应用领域带来哪些改变？"}

],

stream=True

)

for chunk in response:

print(chunk.choices[0].delta.content, end='')

Model Inference

With just a few lines of code, developers can quickly use SiliconCloud's rapid mockup service.

Model Deploy

Upload your workflow and Download the callable Model Service API.

Reduce the chances of application downtime with auto scaling.

Accelerate your workflow as needed.

Service Mode

Serverless Deployment

Built for developers

High-performance inference, industry-leading speed

Diverse models, covering multiple scenarios

Pay-as-you-go, per-token pricing

Serverless rate limits

Get Started→

On-demand Deployment

Enhanced for start-ups

Custom models tailored to your needs

Configurable strategies optimization

Isolated resources for high QoS

Custom enterprise rate limiting

Reserved Capacity

Enhanced for advanced enterprises

Custom models tailored to your needs

Configurable strategies optimization

Isolated resources for high QoS

Custom enterprise rate limiting

Competitive Unit Pricing

Prioritize using the latest product features

Service Mode

Serverless Deployment

Built for developers

High-performance inference, industry-leading speed

Diverse models, covering multiple scenarios

Pay-as-you-go, per-token pricing

Serverless rate limits

Get Started→

On-demand Deployment

Enhanced for start-ups

Custom models tailored to your needs

Configurable strategies optimization

Isolated resources for high QoS

Custom enterprise rate limiting

Reserved Capacity

Enhanced for advanced enterprises

Custom models tailored to your needs

Configurable strategies optimization

Isolated resources for high QoS

Custom enterprise rate limiting

Competitive Unit Pricing

Prioritize using the latest product features

Service Mode

Serverless Deployment

Built for developers

High-performance inference, industry-leading speed

Diverse models, covering multiple scenarios

Pay-as-you-go, per-token pricing

Serverless rate limits

Get Started→

On-demand Deployment

Enhanced for start-ups

Custom models tailored to your needs

Configurable strategies optimization

Isolated resources for high QoS

Custom enterprise rate limiting

Reserved Capacity

Enhanced for advanced enterprises

Custom models tailored to your needs

Configurable strategies optimization

Isolated resources for high QoS

Custom enterprise rate limiting

Competitive Unit Pricing

Prioritize using the latest product features

Service Mode

Serverless Deployment

Built for developers

High-performance inference, industry-leading speed

Diverse models, covering multiple scenarios

Pay-as-you-go, per-token pricing

Serverless rate limits

Get Started→

On-demand Deployment

Enhanced for start-ups

Custom models tailored to your needs

Configurable strategies optimization

Isolated resources for high QoS

Custom enterprise rate limiting

Reserved Capacity

Enhanced for advanced enterprises

Custom models tailored to your needs

Configurable strategies optimization

Isolated resources for high QoS

Custom enterprise rate limiting

Competitive Unit Pricing

Prioritize using the latest product features

Accelerate AGI to Benefit Humanity

Pages

Products

Company

Legal

Accelerate AGI to Benefit Humanity

Pages

Products

Legal

Company

Accelerate AGI to Benefit Humanity

Pages

Products

Company

Legal

OneDiff, High-performance
Image Generation Engine

Teaming up with excellent open-source foundation models.

Select Language

SiliconCloud, Production Ready
Cloud with Low Cost

Models→

Playground →

SiliconCloud, Production Ready Cloud with Low Cost

SiliconCloud, Production Ready Cloud with Low Cost

Top-quality model services

Chat

Image

More

Chat

Image

More

MaaS

Enterprise-level all-scenario model service

Enterprise-level all-scenario model service

MaaS

MaaS

Chat

Image

More

Chat

Image

More

Model Fine-Tune and Deploying

Model Fine-Tune and Deploying

Model Fine-Tune and Deploying

designed for large-scale model fine-tuning and deploying. Through the platform, users can quickly and seamlessly deploy custom models as services and fine-tune them based on the data uploaded.

designed for large-scale model fine-tuning and deploying. Through the platform, users can quickly and seamlessly deploy custom models as services and fine-tune them based on the data uploaded.

Data Upload

Fine-tuning

Effect Evaluation

Model Deploying

One-Stop: From Fine-Tune to Deploying

One-Stop: From Fine-Tune to Deploying

One-Stop: From Fine-Tune to Deploying

High performance, flexible, and ease to use

High performance, flexible, and ease to use

High performance, flexible, and ease to use

Blazing fsat model inference

Blazing fsat model inference

Blazing fsat model inference

Time latency of LLM is reduced by up to 2.7 times

Time latency of LLM is reduced by up to 2.7 times

Time latency of LLM is reduced by up to 2.7 times

Speed of text to image is increased by 3 times

Speed of text to image is increased by 3 times

Speed of text to image is increased by 3 times

Auto-scaling on demand

Auto-scaling on demand

Auto-scaling on demand

Easy to use

Easy to use

Easy to use

Model Inference

With just a few lines of code, developers can quickly use SiliconCloud's rapid mockup service.

Model Deploy

Model Inference

Model Inference

With just a few lines of code, developers can quickly use SiliconCloud's rapid mockup service.

With just a few lines of code, developers can quickly use SiliconCloud's rapid mockup service.

Model Deploy

Model Deploy

Service Mode

Serverless Deployment

On-demand Deployment

Reserved Capacity

Service Mode

Service Mode

Service Mode

OneDiff, High-performance Image Generation Engine

SiliconCloud, Production Ready Cloud with Low Cost

SiliconCloud, Production Ready
Cloud with Low Cost

SiliconCloud, Production Ready
Cloud with Low Cost

OneDiff, High-performance
Image Generation Engine

SiliconCloud, Production Ready
Cloud with Low Cost