About 5,800,000 results
Open links in new tab
  1. What does Keras Tokenizer method exactly do? - Stack Overflow

    On occasion, circumstances require us to do the following: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=my_max) Then, invariably, we chant this mantra: …

  2. TypeError in SFTTrainer Initialization: Unexpected Keyword Argument ...

    Mar 31, 2025 · I tried passing tokenizer inside training_arguments, but that didn't work either. Question: Is SFTTrainer expecting the tokenizer to be handled differently in the latest versions of trl? How …

  3. Looking for a clear definition of what a "tokenizer", "parser" and ...

    A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context to the tokens -- this token …

  4. Huggingface saving tokenizer - Stack Overflow

    Oct 27, 2020 · Make sure that: - './models/tokenizer/' is a correct model identifier listed on 'https://huggingface.co/models' - or './models/tokenizer/' is the correct path to a directory containing a …

  5. How to save a tokenizer after training it? - Stack Overflow

    Aug 12, 2021 · I have just followed this tutorial on how to train my own tokenizer. Now, from training my tokenizer, I have wrapped it inside a Transformers object, so that I can use it with the transformers libr...

  6. OpenAI API: How do I count tokens before(!) I send an API request?

    Mar 21, 2023 · How do I count tokens before (!) I send an API request? As stated in the official OpenAI article: To further explore tokenization, you can use our interactive Tokenizer tool, which allows you …

  7. How to add new tokens to an existing Huggingface tokenizer?

    May 8, 2023 · And then it points to the train_new_from_iterator() function in Chapter 7 but I can't seem to find reference to how to use it to extend the tokenizer without re-training it.

  8. How to load the saved tokenizer from pretrained model

    Oct 16, 2019 · I fine-tuned a pretrained BERT model in Pytorch using huggingface transformer. All the training/validation is done on a GPU in cloud. At the end of the training, I save the model and …

  9. python - AutoTokenizer.from_pretrained fails to load locally saved ...

    from transformers import AutoTokenizer, AutoConfig tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') config = AutoConfig.from_pretrained('distilroberta …

  10. how to use tiktoken in offline mode computer - Stack Overflow

    Apr 26, 2023 · First, let's grab the tokenizer blob URL from the source on your remote machine. If we trace the get_encoding function, we find it calls a function from tiktoken_ext.openai_public which has …