윈디하나의 누리사랑방. 이런 저런 얘기
$ mkdir DeepSeek-R1-Distill-Qwen-1.5B
$ cd DeepSeek-R1-Distill-Qwen-1.5B
$ python3 -m venv venv
$ . venv/bin/activate
(venv) $ pip install vllm
(venv) $ vi run.py
from vllm import LLM, SamplingParams
llm = LLM(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
dtype="half",
max_model_len=8_000
)
sampling_params = SamplingParams(
temperature=0.5,
max_tokens=8_000
)
prompts = [
"""The day before yesterday I was 25. The next year I will be 28. This is true only one day in a year. What day is my Birthday?"""
]
conversations = [
[{"role": "user", "content": x}] for x in prompts
]
outputs = llm.chat(conversations, sampling_params=sampling_params)
for output in outputs:
print(output.outputs[0].text)
(venv) $ python run.py
(venv) $ python run.py
INFO 02-00 18:51:00 __init__.py:183] Automatically detected platform cuda.
WARNING 02-00 18:51:02 config.py:2368] Casting torch.bfloat16 to torch.float16.
INFO 02-00 18:51:07 config.py:526] This model supports multiple tasks: ...
INFO 02-00 18:51:07 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.1) ...
INFO 02-00 18:51:08 cuda.py:184] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 02-00 18:51:08 cuda.py:232] Using XFormers backend.
INFO 02-00 18:51:09 model_runner.py:1111] Starting to load model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
INFO 02-00 18:51:09 weight_utils.py:251] Using model weights format ['*.safetensors']
INFO 02-00 18:51:10 weight_utils.py:296] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.06it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.06it/s]
INFO 02-00 18:51:11 model_runner.py:1116] Loading model weights took 3.3460 GB
INFO 02-00 18:51:12 worker.py:266] Memory profiling takes 1.05 seconds
INFO 02-00 18:51:12 worker.py:266] the current vLLM instance can use
total_gpu_memory (11.55GiB) x gpu_memory_utilization (0.90) = 10.40GiB
INFO 02-00 18:51:12 worker.py:266] model weights take 3.35GiB;
non_torch_memory takes 0.04GiB; PyTorch activation peak memory takes 1.41GiB;
the rest of the memory reserved for KV Cache is 5.60GiB.
INFO 02-00 18:51:12 executor_base.py:108] # CUDA blocks: 13097, # CPU blocks: 9362
INFO 02-00 18:51:12 executor_base.py:113] Maximum concurrency for 8000 tokens per request: 26.19x
INFO 02-00 18:51:14 model_runner.py:1435] Capturing cudagraphs for decoding.
This may lead to unexpected consequences if the model is not static.
To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
If out-of-memory error occurs during cudagraph capture,
consider decreasing `gpu_memory_utilization` or switching to eager mode.
You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|█████████████| 35/35 [00:09<00:00, 3.63it/s]
INFO 02-00 18:51:24 model_runner.py:1563] Graph capturing finished in 10 secs, took 0.19 GiB
INFO 02-00 18:51:24 llm_engine.py:429] init engine (profile, create kv cache, warmup model)
took 13.48 seconds
INFO 02-00 18:51:24 chat_utils.py:330] Detected the chat template content format to be 'string'.
You can set `--chat-template-content-format` to override this.
Processed prompts: 100%|███████████████████████
[00:04<00:00, 4.14s/it, est. speed input: 9.66 toks/s, output: 81.16 toks/s]
<think>
To determine the day of the birthday, I start by analyzing the information given.
First, it's stated that the day before yesterday, the person was 25.
This means that yesterday was the birthday day, and today is the day after birthday.
Next, it's mentioned that the next year, the person will be 28.
This implies that the current age is 25 plus 1, which is 26.
Since the person will be 28 in the next year, their birthday must be on December 31st.
This is because the next year's birthday will occur on the same date,
and the age will increase by one year.
Therefore, the birthday is December 31st.
</think>
To determine the day of your birthday, let's analyze the information step by step:
1. **Day Before Yesterday:**
- You were **25** years old the day before yesterday.
- This means **yesterday** was your birthday day.
2. **Next Year:**
- You will be **28** years old in the next year.
- If you are currently 26 years old, your birthday will be on **December 31st** because:
- **Today** is **December 31st**.
- **Tomorrow** will be **January 1st**, and you'll turn **27**.
- **Next Year** (two years from now) you'll be **28** years old.
**Conclusion:**
Your birthday is on **\boxed{December 31st}**.
DeepSeek R1 벤치마크.
nVidia H100
민주주의 최후의 보루는 깨어있는 시민의 조직된 힘입니다.
nVidia B200
nVidia GB200 (Grace Blackwell Superchip)
nVidia GB200 NVL72
비상계엄 해제 요구 결의안 가결