Discover our services

Preference Ranking

At Stokes AI, we specialize in delivering high-quality AI training data in formats that seamlessly integrate with your systems and workflows. We support industry-standard file types such as JSON, JSONL, CSV, and TSV for model evaluations and labelled data. We also accommodate secure, portal-based workflows where our team works directly within client environments or model sandboxes to test outputs and deliver precise, aligned human feedback.

We evaluate and rank and compare AI-generated responses using structured frameworks, including Likert scales and multi-dimensional labeling. Each output can be rated on attributes like truthfulness, instruction following, verbosity, and harmlessness. This granular feedback helps models learn not just what the “best” response is, but why it’s preferred—pushing them toward more helpful, safe, and human-aligned behavior.

Adversarial prompting

Our team specializes in writing adversarial prompts and LLM-jailbreak attempts that expose model vulnerabilities, such as to elicit hallucinations, logical inconsistencies, or policy-breaking behavior. These stress-test prompts are essential for identifying blind spots and edge cases, ensuring models behave reliably even in challenging or unexpected situations.

Our team includes domain experts from fields such as mathematics, psychology, finance, science, and history. These reviewers evaluate AI responses within their area of expertise for factual accuracy, truthfulness, and conceptual depth. This ensures that model behavior remains robust and trustworthy when operating in technically or culturally nuanced areas that require more than surface-level understanding.

Domain-Specific EXPERTISE

Classification/ Categorization

We tag and categorize AI outputs based on content type, tone, intent, or safety level using structured taxonomies and annotation schemas. Tasks include identifying whether a response is factual or speculative, sarcastic or sincere, or safe or harmful. This structured labeling supports fine-tuning and model evaluation pipelines that depend on clean, reliable classification data.

Safety and Alignment Evaluation

We review outputs for harmful, biased, or policy-violating content, helping ensure models adhere to safety guidelines and alignment objectives. Tasks include red-teaming, detecting subtle toxicity or stereotyping, and evaluating whether a model is acting in accordance with intended behavior. This is a critical part of building AI systems that are ethical, inclusive, and aligned with human values.

We generate or refine high-quality instruction-response pairs that teach models how to follow complex tasks, answer factual questions, or walk users through reasoning steps. This includes writing ideal completions, correcting flawed outputs, and demonstrating expert behavior across diverse domains. These curated datasets form the foundation for models that behave helpfully, honestly, and informatively out of the box.

Chain of Thought Reasoning

Factual Verification

We assess the factual accuracy of model responses—verifying claims, checking for hallucinations, and flagging errors. Where needed, we provide corrections or references that demonstrate what a truthful response should look like. This is especially important for fine-tuning models in scientific, academic, or high-stakes environments where precision and trustworthiness matter most.

Supervised Fine-Tuning (SFT) Tasks

We evaluate multi-step model outputs for logical soundness, intermediate reasoning accuracy, and mathematical rigor. Our annotators trace the AI’s step-by-step reasoning in tasks like complex problem solving, logical deductions, and multi-part questions—identifying where the model's thought process breaks down or succeeds. This feedback helps models build more coherent and verifiable reasoning chains.

Here at Stokes AI, we understand that AI training needs can be unique and evolving. If your project requires specialized data annotation, custom evaluation tasks, or any AI training work not listed above (such as agentic AI or foreign language translation, NLP, audio transcription, and interpretation), don’t hesitate to reach out. Our team is flexible and ready to collaborate closely with you to design tailored solutions that meet your specific requirements and help advance your AI models.

Other AI training not listed here

Ready to work with us? Get a Quote