State-of-the-Art Transformer Pipelines in spaCy

November 10, 2023 from 16:45 to 17:15

Speaker: Daniël de Kok & Madeeswaran Kannan

spaCy is a Python library building for natural language processing pipelines. You can use spaCy for a wide variety of text-related tasks, such as identifying named entities, labeling spans, classifying documents, or uncovering the syntactic structure of text.

The modular architecture of spaCy makes it possible to use a variety of neural network models, such as convolutional networks from Thinc, transformer models from Curated Transformers or Huggingface Transformers, and a wide variety of large language models (LLMs) using spacy-llm.

In this talk, we will show you how you can use transformer models (from pretrained models such as XLM-RoBERTa to large language models like Llama2) to create state-of-the-art annotation pipelines for text annotation tasks such as named entity recognition.

Daniël is a machine learning engineer at Explosion. He has worked on neural network models for natural language processing for over a decade. He enjoys everything from building new language processing components to squeezing the last bit of performance out of CUDA kernels.

Madeesh is a machine learning engineer at Explosion who likes to tinker with the low-level nuts and bolts of code, be it neural networks or otherwise. He has also worked as an NLP researcher on language acquisition and question generation.