# PatchTST[[patchtst]]

## 개요[[overview]]

The PatchTST 모델은 Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam이 제안한 [시계열 하나가 64개의 단어만큼 가치있다: 트랜스포머를 이용한 장기예측](https://huggingface.co/papers/2211.14730)라는 논문에서 소개되었습니다.

이 모델은 고수준에서 시계열을 주어진 크기의 패치로 벡터화하고, 결과로 나온 벡터 시퀀스를 트랜스포머를 통해 인코딩한 다음 적절한 헤드를 통해 예측 길이의 예측을 출력합니다. 모델은 다음 그림과 같이 도식화됩니다:

![모델](https://github.com/namctin/transformers/assets/8100/150af169-29de-419a-8d98-eb78251c21fa)

해당 논문의 초록입니다:

*우리는 다변량 시계열 예측과 자기 감독 표현 학습을 위한 효율적인 트랜스포머 기반 모델 설계를 제안합니다. 이는 두 가지 주요 구성 요소를 기반으로 합니다: 

(i) 시계열을 하위 시리즈 수준의 패치로 분할하여 트랜스포머의 입력 토큰으로 사용
(ii) 각 채널이 모든 시리즈에 걸쳐 동일한 임베딩과 트랜스포머 가중치를 공유하는 단일 단변량 시계열을 포함하는 채널 독립성. 패칭 설계는 자연스럽게 세 가지 이점을 가집니다: 
    - 지역적 의미 정보가 임베딩에 유지됩니다; 
    - 동일한 룩백 윈도우에 대해 어텐션 맵의 계산과 메모리 사용량이 제곱으로 감소합니다
    - 모델이 더 긴 과거를 참조할 수 있습니다. 
    우리의 채널 독립적 패치 시계열 트랜스포머(PatchTST)는 최신 트랜스포머 기반 모델들과 비교했을 때 장기 예측 정확도를 크게 향상시킬 수 있습니다. 또한 모델을 자기지도 사전 훈련 작업에 적용하여, 대규모 데이터셋에 대한 지도 학습을 능가하는 아주 뛰어난 미세 조정 성능을 달성했습니다. 한 데이터셋에서 마스크된 사전 훈련 표현을 다른 데이터셋으로 전이하는 것도 최고 수준의 예측 정확도(SOTA)를 산출했습니다.*

이 모델은 [namctin](https://huggingface.co/namctin), [gsinthong](https://huggingface.co/gsinthong), [diepi](https://huggingface.co/diepi), [vijaye12](https://huggingface.co/vijaye12), [wmgifford](https://huggingface.co/wmgifford), [kashif](https://huggingface.co/kashif)에 의해 기여 되었습니다. 원본코드는 [이곳](https://github.com/yuqinie98/PatchTST)에서 확인할 수 있습니다.

## 사용 팁[[usage-tips]]

이 모델은 시계열 분류와 시계열 회귀에도 사용될 수 있습니다. 각각 [PatchTSTForClassification](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTForClassification)와 [PatchTSTForRegression](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTForRegression) 클래스를 참조하세요.

## 자료[[resources]]

- PatchTST를 자세히 설명하는 블로그 포스트는 [이곳](https://huggingface.co/blog/patchtst)에서 찾을 수 있습니다. 
이 블로그는 Google Colab에서도 열어볼 수 있습니다.

## PatchTSTConfig[[transformers.PatchTSTConfig]][[transformers.PatchTSTConfig]]

#### transformers.PatchTSTConfig[[transformers.PatchTSTConfig]]

[Source](https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/configuration_patchtst.py#L24)

This is the configuration class to store the configuration of a PatchTSTModel. It is used to instantiate a Patchtst
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the [ibm-granite/granite-timeseries-patchtst](https://huggingface.co/ibm-granite/granite-timeseries-patchtst)

Configuration objects inherit from [PreTrainedConfig](/docs/transformers/v5.6.0/ko/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the
documentation from [PreTrainedConfig](/docs/transformers/v5.6.0/ko/main_classes/configuration#transformers.PreTrainedConfig) for more information.

```python
>>> from transformers import PatchTSTConfig, PatchTSTModel

>>> # Initializing an PatchTST configuration with 12 time steps for prediction
>>> configuration = PatchTSTConfig(prediction_length=12)

>>> # Randomly initializing a model (with random weights) from the configuration
>>> model = PatchTSTModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config
```

**Parameters:**

num_input_channels (`int`, *optional*, defaults to `1`) : The number of input channels.

context_length (`int`, *optional*, defaults to 32) : The context length of the input sequence.

distribution_output (`str`, *optional*, defaults to `"student_t"`) : The distribution emission head for the model when loss is "nll". Could be either "student_t", "normal" or "negative_binomial".

loss (`str`, *optional*, defaults to `"mse"`) : The loss function for the model corresponding to the `distribution_output` head. For parametric distributions it is the negative log likelihood ("nll") and for point estimates it is the mean squared error "mse".

patch_length (`int`, *optional*, defaults to 1) : Define the patch length of the patchification process.

patch_stride (`int`, *optional*, defaults to 1) : Define the stride of the patchification process.

num_hidden_layers (`int`, *optional*, defaults to `3`) : Number of hidden layers in the Transformer decoder.

d_model (`int`, *optional*, defaults to `128`) : Size of the encoder layers and the pooler layer.

num_attention_heads (`int`, *optional*, defaults to 4) : Number of attention heads for each attention layer in the Transformer encoder.

share_embedding (`bool`, *optional*, defaults to `True`) : Sharing the input embedding across all channels.

channel_attention (`bool`, *optional*, defaults to `False`) : Activate channel attention block in the Transformer to allow channels to attend each other.

ffn_dim (`int`, *optional*, defaults to 512) : Dimension of the "intermediate" (often named feed-forward) layer in the Transformer encoder.

norm_type (`str` , *optional*, defaults to `"batchnorm"`) : Normalization at each Transformer layer. Can be `"batchnorm"` or `"layernorm"`.

norm_eps (`float`, *optional*, defaults to 1e-05) : A value added to the denominator for numerical stability of normalization.

attention_dropout (`Union[float, int]`, *optional*, defaults to `0.0`) : The dropout ratio for the attention probabilities.

positional_dropout (`float`, *optional*, defaults to 0.0) : The dropout probability in the positional embedding layer.

path_dropout (`float`, *optional*, defaults to 0.0) : The dropout path in the residual block.

ff_dropout (`float`, *optional*, defaults to 0.0) : The dropout probability used between the two layers of the feed-forward networks.

bias (`bool`, *optional*, defaults to `True`) : Whether to add bias in the feed-forward networks.

activation_function (`str`, *optional*, defaults to `"gelu"`) : The non-linear activation function (string) in the Transformer.`"gelu"` and `"relu"` are supported.

pre_norm (`bool`, *optional*, defaults to `True`) : Normalization is applied before self-attention if pre_norm is set to `True`. Otherwise, normalization is applied after residual block.

positional_encoding_type (`str`, *optional*, defaults to `"sincos"`) : Positional encodings. Options `"random"` and `"sincos"` are supported.

use_cls_token (`bool`, *optional*, defaults to `False`) : Whether cls token is used.

init_std (`float`, *optional*, defaults to 0.02) : The standard deviation of the truncated normal weight initialization distribution.

share_projection (`bool`, *optional*, defaults to `True`) : Sharing the projection layer across different channels in the forecast head.

scaling (`Union`, *optional*, defaults to `"std"`) : Whether to scale the input targets via "mean" scaler, "std" scaler or no scaler if `None`. If `True`, the scaler is set to "mean".

do_mask_input (`bool`, *optional*) : Apply masking during the pretraining.

mask_type (`str`, *optional*, defaults to `"random"`) : Masking type. Only `"random"` and `"forecast"` are currently supported.

random_mask_ratio (`float`, *optional*, defaults to 0.5) : Masking ratio applied to mask the input data during random pretraining.

num_forecast_mask_patches (`int` or `list`, *optional*, defaults to `[2]`) : Number of patches to be masked at the end of each batch sample. If it is an integer, all the samples in the batch will have the same number of masked patches. If it is a list, samples in the batch will be randomly masked by numbers defined in the list. This argument is only used for forecast pretraining.

channel_consistent_masking (`bool`, *optional*, defaults to `False`) : If channel consistent masking is True, all the channels will have the same masking pattern.

unmasked_channel_indices (`list`, *optional*) : Indices of channels that are not masked during pretraining. Values in the list are number between 1 and `num_input_channels`

mask_value (`int`, *optional*, defaults to 0) : Values in the masked patches will be filled by `mask_value`.

pooling_type (`str`, *optional*, defaults to `"mean"`) : Pooling of the embedding. `"mean"`, `"max"` and `None` are supported.

head_dropout (`float`, *optional*, defaults to 0.0) : The dropout probability for head.

prediction_length (`int`, *optional*, defaults to 24) : The prediction horizon that the model will output.

num_targets (`int`, *optional*, defaults to 1) : Number of targets for regression and classification tasks. For classification, it is the number of classes.

output_range (`list`, *optional*) : Output range for regression task. The range of output values can be set to enforce the model to produce values within a range.

num_parallel_samples (`int`, *optional*, defaults to 100) : The number of samples is generated in parallel for probabilistic prediction.

## PatchTSTModel[[transformers.PatchTSTModel]][[transformers.PatchTSTModel]]

#### transformers.PatchTSTModel[[transformers.PatchTSTModel]]

[Source](https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1078)

The bare Patchtst Model outputting raw hidden-states without any specific head on top.

This model inherits from [PreTrainedModel](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)

This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.

forwardtransformers.PatchTSTModel.forwardhttps://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1097[{"name": "past_values", "val": ": Tensor"}, {"name": "past_observed_mask", "val": ": torch.Tensor | None = None"}, {"name": "future_values", "val": ": torch.Tensor | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **past_values** (`torch.Tensor` of shape `(bs, sequence_length, num_input_channels)`, *required*) --
  Input sequence to the model
- **past_observed_mask** (`torch.BoolTensor` of shape `(batch_size, sequence_length, num_input_channels)`, *optional*) --
  Boolean mask to indicate which `past_values` were observed and which were missing. Mask values selected
  in `[0, 1]`:

  - 1 for values that are **observed**,
  - 0 for values that are **missing** (i.e. NaNs that were replaced by zeros).
- **future_values** (`torch.BoolTensor` of shape `(batch_size, prediction_length, num_input_channels)`, *optional*) --
  Future target values associated with the `past_values`
- **output_hidden_states** (`bool`, *optional*) --
  Whether or not to return the hidden states of all layers
- **output_attentions** (`bool`, *optional*) --
  Whether or not to return the output attention of all layers
- **return_dict** (`bool`, *optional*) --
  Whether or not to return a `ModelOutput` instead of a plain tuple.0`PatchTSTModelOutput` or tuple of `torch.Tensor` (if `return_dict`=False or `config.return_dict`=False)

Examples:

```python
>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import PatchTSTModel

>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/etth1-hourly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)

>>> model = PatchTSTModel.from_pretrained("namctin/patchtst_etth1_pretrain")

>>> # during training, one provides both past and future values
>>> outputs = model(
...     past_values=batch["past_values"],
...     future_values=batch["future_values"],
... )

>>> last_hidden_state = outputs.last_hidden_state
```

**Parameters:**

config ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.

**Returns:**

`PatchTSTModelOutput` or tuple of `torch.Tensor` (if `return_dict`=False or `config.return_dict`=False)

## PatchTSTForPrediction[[transformers.PatchTSTForPrediction]][[transformers.PatchTSTForPrediction]]

#### transformers.PatchTSTForPrediction[[transformers.PatchTSTForPrediction]]

[Source](https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1572)

The PatchTST for prediction model.

This model inherits from [PreTrainedModel](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)

This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.

forwardtransformers.PatchTSTForPrediction.forwardhttps://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1602[{"name": "past_values", "val": ": Tensor"}, {"name": "past_observed_mask", "val": ": torch.Tensor | None = None"}, {"name": "future_values", "val": ": torch.Tensor | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **past_values** (`torch.Tensor` of shape `(bs, sequence_length, num_input_channels)`, *required*) --
  Input sequence to the model
- **past_observed_mask** (`torch.BoolTensor` of shape `(batch_size, sequence_length, num_input_channels)`, *optional*) --
  Boolean mask to indicate which `past_values` were observed and which were missing. Mask values selected
  in `[0, 1]`:

  - 1 for values that are **observed**,
  - 0 for values that are **missing** (i.e. NaNs that were replaced by zeros).
- **future_values** (`torch.Tensor` of shape `(bs, forecast_len, num_input_channels)`, *optional*) --
  Future target values associated with the `past_values`
- **output_hidden_states** (`bool`, *optional*) --
  Whether or not to return the hidden states of all layers
- **output_attentions** (`bool`, *optional*) --
  Whether or not to return the output attention of all layers
- **return_dict** (`bool`, *optional*) --
  Whether or not to return a `ModelOutput` instead of a plain tuple.0`PatchTSTForPredictionOutput` or tuple of `torch.Tensor` (if `return_dict`=False or
`config.return_dict`=False)

Examples:

```python
>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import PatchTSTConfig, PatchTSTForPrediction

>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/etth1-hourly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)

>>> # Prediction task with 7 input channels and prediction length is 96
>>> model = PatchTSTForPrediction.from_pretrained("namctin/patchtst_etth1_forecast")

>>> # during training, one provides both past and future values
>>> outputs = model(
...     past_values=batch["past_values"],
...     future_values=batch["future_values"],
... )

>>> loss = outputs.loss
>>> loss.backward()

>>> # during inference, one only provides past values, the model outputs future values
>>> outputs = model(past_values=batch["past_values"])
>>> prediction_outputs = outputs.prediction_outputs
```

**Parameters:**

config ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.

**Returns:**

`PatchTSTForPredictionOutput` or tuple of `torch.Tensor` (if `return_dict`=False or
`config.return_dict`=False)

## PatchTSTForClassification[[transformers.PatchTSTForClassification]][[transformers.PatchTSTForClassification]]

#### transformers.PatchTSTForClassification[[transformers.PatchTSTForClassification]]

[Source](https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1378)

The PatchTST for classification model.

This model inherits from [PreTrainedModel](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)

This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.

forwardtransformers.PatchTSTForClassification.forwardhttps://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1393[{"name": "past_values", "val": ": Tensor"}, {"name": "target_values", "val": ": torch.Tensor | None = None"}, {"name": "past_observed_mask", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **past_values** (`torch.Tensor` of shape `(bs, sequence_length, num_input_channels)`, *required*) --
  Input sequence to the model
- **target_values** (`torch.Tensor`, *optional*) --
  Labels associates with the `past_values`
- **past_observed_mask** (`torch.BoolTensor` of shape `(batch_size, sequence_length, num_input_channels)`, *optional*) --
  Boolean mask to indicate which `past_values` were observed and which were missing. Mask values selected
  in `[0, 1]`:

  - 1 for values that are **observed**,
  - 0 for values that are **missing** (i.e. NaNs that were replaced by zeros).
- **output_hidden_states** (`bool`, *optional*) --
  Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
  more detail.
- **output_attentions** (`bool`, *optional*) --
  Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
  tensors for more detail.
- **return_dict** (`bool`, *optional*) --
  Whether or not to return a [ModelOutput](/docs/transformers/v5.6.0/ko/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0`PatchTSTForClassificationOutput` or `tuple(torch.FloatTensor)`A `PatchTSTForClassificationOutput` or a tuple of
`torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various
elements depending on the configuration ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) and inputs.
The [PatchTSTForClassification](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTForClassification) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

- **loss** (`*optional*`, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`) -- Total loss as the sum of the masked language modeling loss and the next sequence prediction
  (classification) loss.
- **prediction_logits** (`torch.FloatTensor` of shape `(batch_size, num_targets)`) -- Prediction scores of the PatchTST modeling head (scores before SoftMax).
- **hidden_states** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, +
  one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`.

  Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
- **attentions** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
  sequence_length)`.

  Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
  heads.

Examples:

```python
>>> from transformers import PatchTSTConfig, PatchTSTForClassification

>>> # classification task with two input channel2 and 3 classes
>>> config = PatchTSTConfig(
...     num_input_channels=2,
...     num_targets=3,
...     context_length=512,
...     patch_length=12,
...     stride=12,
...     use_cls_token=True,
... )
>>> model = PatchTSTForClassification(config=config)

>>> # during inference, one only provides past values
>>> past_values = torch.randn(20, 512, 2)
>>> outputs = model(past_values=past_values)
>>> labels = outputs.prediction_logits
```

**Parameters:**

config ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.

**Returns:**

``PatchTSTForClassificationOutput` or `tuple(torch.FloatTensor)``

A `PatchTSTForClassificationOutput` or a tuple of
`torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various
elements depending on the configuration ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) and inputs.

## PatchTSTForPretraining[[transformers.PatchTSTForPretraining]][[transformers.PatchTSTForPretraining]]

#### transformers.PatchTSTForPretraining[[transformers.PatchTSTForPretraining]]

[Source](https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1224)

The PatchTST for pretrain model.

This model inherits from [PreTrainedModel](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)

This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.

forwardtransformers.PatchTSTForPretraining.forwardhttps://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1235[{"name": "past_values", "val": ": Tensor"}, {"name": "past_observed_mask", "val": ": torch.Tensor | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **past_values** (`torch.Tensor` of shape `(bs, sequence_length, num_input_channels)`, *required*) --
  Input sequence to the model
- **past_observed_mask** (`torch.BoolTensor` of shape `(batch_size, sequence_length, num_input_channels)`, *optional*) --
  Boolean mask to indicate which `past_values` were observed and which were missing. Mask values selected
  in `[0, 1]`:

  - 1 for values that are **observed**,
  - 0 for values that are **missing** (i.e. NaNs that were replaced by zeros).
- **output_hidden_states** (`bool`, *optional*) --
  Whether or not to return the hidden states of all layers
- **output_attentions** (`bool`, *optional*) --
  Whether or not to return the output attention of all layers
- **return_dict** (`bool`, *optional*) -- Whether or not to return a `ModelOutput` instead of a plain tuple.0`PatchTSTForPretrainingOutput` or tuple of `torch.Tensor` (if `return_dict`=False or
`config.return_dict`=False)

Examples:

```python
>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import PatchTSTConfig, PatchTSTForPretraining

>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/etth1-hourly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)

>>> # Config for random mask pretraining
>>> config = PatchTSTConfig(
...     num_input_channels=7,
...     context_length=512,
...     patch_length=12,
...     stride=12,
...     mask_type='random',
...     random_mask_ratio=0.4,
...     use_cls_token=True,
... )
>>> # Config for forecast mask pretraining
>>> config = PatchTSTConfig(
...     num_input_channels=7,
...     context_length=512,
...     patch_length=12,
...     stride=12,
...     mask_type='forecast',
...     num_forecast_mask_patches=5,
...     use_cls_token=True,
... )
>>> model = PatchTSTForPretraining(config)

>>> # during training, one provides both past and future values
>>> outputs = model(past_values=batch["past_values"])

>>> loss = outputs.loss
>>> loss.backward()
```

**Parameters:**

config ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.

**Returns:**

`PatchTSTForPretrainingOutput` or tuple of `torch.Tensor` (if `return_dict`=False or
`config.return_dict`=False)

## PatchTSTForRegression[[transformers.PatchTSTForRegression]][[transformers.PatchTSTForRegression]]

#### transformers.PatchTSTForRegression[[transformers.PatchTSTForRegression]]

[Source](https://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1821)

The PatchTST for regression model.

This model inherits from [PreTrainedModel](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)

This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.

forwardtransformers.PatchTSTForRegression.forwardhttps://github.com/huggingface/transformers/blob/v5.6.0/src/transformers/models/patchtst/modeling_patchtst.py#L1848[{"name": "past_values", "val": ": Tensor"}, {"name": "target_values", "val": ": torch.Tensor | None = None"}, {"name": "past_observed_mask", "val": ": torch.Tensor | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **past_values** (`torch.Tensor` of shape `(bs, sequence_length, num_input_channels)`, *required*) --
  Input sequence to the model
- **target_values** (`torch.Tensor` of shape `(bs, num_input_channels)`) --
  Target values associates with the `past_values`
- **past_observed_mask** (`torch.BoolTensor` of shape `(batch_size, sequence_length, num_input_channels)`, *optional*) --
  Boolean mask to indicate which `past_values` were observed and which were missing. Mask values selected
  in `[0, 1]`:

  - 1 for values that are **observed**,
  - 0 for values that are **missing** (i.e. NaNs that were replaced by zeros).
  Whether or not to return a `ModelOutput` instead of a plain tuple.
- **output_hidden_states** (`bool`, *optional*) --
  Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
  more detail.
- **output_attentions** (`bool`, *optional*) --
  Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
  tensors for more detail.
- **return_dict** (`bool`, *optional*) --
  Whether or not to return a [ModelOutput](/docs/transformers/v5.6.0/ko/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0`PatchTSTForRegressionOutput` or `tuple(torch.FloatTensor)`A `PatchTSTForRegressionOutput` or a tuple of
`torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various
elements depending on the configuration ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) and inputs.
The [PatchTSTForRegression](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTForRegression) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

- **loss** (`*optional*`, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`) -- MSE loss.
- **regression_outputs** (`torch.FloatTensor` of shape `(batch_size, num_targets)`) -- Regression outputs of the time series modeling heads.
- **hidden_states** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, +
  one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`.

  Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
- **attentions** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
  sequence_length)`.

  Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
  heads.

Examples:

```python
>>> from transformers import PatchTSTConfig, PatchTSTForRegression

>>> # Regression task with 6 input channels and regress 2 targets
>>> model = PatchTSTForRegression.from_pretrained("namctin/patchtst_etth1_regression")

>>> # during inference, one only provides past values, the model outputs future values
>>> past_values = torch.randn(20, 512, 6)
>>> outputs = model(past_values=past_values)
>>> regression_outputs = outputs.regression_outputs
```

**Parameters:**

config ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/v5.6.0/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.

**Returns:**

``PatchTSTForRegressionOutput` or `tuple(torch.FloatTensor)``

A `PatchTSTForRegressionOutput` or a tuple of
`torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various
elements depending on the configuration ([PatchTSTConfig](/docs/transformers/v5.6.0/ko/model_doc/patchtst#transformers.PatchTSTConfig)) and inputs.

