Innovation Blog

Trust, Accountability, and A.I. in Education

Robert Capps authored an article appearing in the New York Times in June 2025 entitled A.I. Might Take Your Job. Here Are 22 New Ones It Could Give You. The article articulates some key ideas about the direction of A.I. in intellectual work that, in my view, have strong relevance to education.

Mr. Capps believes that there are three areas where humans will continue to be necessary in an economy with A.I. presence: trust, integration, and taste.

“As A.I. continues to become more influential in our jobs and organizations,” writes Mr. Capps, “we’re going to develop a lot of these trust issues. Solving them will require humans.” “The ‘trust issues’ he refers to stem from the fact that A.I. can generate large amounts of data, but that doesn’t make it inherently trustworthy.

We will need humans to check the data and to check the ethics of what is being produced. Trust, he writes, is about accountability and who is taking responsibility for the work product that A.I. has been used to produce. “In a number of fields”, he writes, “from law to architecture, A.I. will be able to do much of the basic work customers need… but at some point, a human, perhaps even a certified one, needs to sign off on the work.”

Mr. Capps’ purpose is to describe changes in the job market that are likely to occur in the “creative destruction” happening in the economy as A.I. technology is increasingly deployed. But further, the concept of trust and accountability articulates an important element of our approach to and philosophy of integrating A.I. into our teaching platform. Every one of our applications that incorporate A.I., and that is soon to be all of them, is designed such that A.I. work product is reviewed and monitored by the instructor before it is published in activities and assessments for students. There is, for example, no “Make me a test on positive and negative integers” button at Innovation. There is no “grade all my student’s papers please” button. All of the A.I. integrations are placed in the same user interface that would be used before there was A.I. A form is filled out, but the teacher needs to review and modify if necessary, or possibly even discard, the A.I. work product.

I realize that this design policy may place Innovation at a disadvantage in the marketplace, where competition between online educational platforms is intense. Instructors may be attracted to the one-button-does-it-all platform. But my conversations with educational professionals and with my students lead me to conclude that discerning subscribers will prefer Innovation precisely because teachers want to foster that trust by demonstrating that accountability through a platform whose very design and structure promotes them.

World Language Conversation Training at Innovation

This application is for our world language teachers. The Directed Conversation is a training and evaluation tool for world language courses for conversational fluency. Notably, readers may recognize this format as that used in the AP French exam.

Innovation has two apps of note here. One is the generator, the other is the app to conduct a directed conversation.

The Generator

The generator lets teachers create, with optional assist from AI, a directed conversation at the level they need for their students. Innovation uses the “Common European Framework of Reference for Languages” (CEFR)) standard to define linguistic competence at various levels. The CEFR is widely used to assess and compare the language proficiency of learners across different languages and educational systems. To run the generator, teachers complete the basic form:

Once set, teachers can use the AI integration to generate a conversation according to the criteria they set. Once checking and editing, teachers can save the task to use with students.

Conversation Levels in Our App: What They Look Like

🌱 A Level – Beginner (Student-Directed Conversations)

Conversations are simple and highly structured.
The instructor’s lines are fully written out in the target language, like a script.
The student sees step-by-step English instructions for what to do (e.g., “Ask for a drink”), but they are not given exact words to say.
Focus: The student practices basic survival phrases and predictable interactions, like ordering food or asking for directions.
Example format:
- Student – Ask for a table for two.
- Instructor – Bien sûr, par ici.

🔹 B Level – Intermediate (Improvised, Goal-Based Conversations)

At this level, conversations are more flexible and involve guided improvisation.

There are two conversation types:

Instructor Starts:
The instructor follows prompts written in the target language to improvise their parts.
The student’s tasks remain in English as communicative goals (e.g., “Ask the price”).
Student Starts:
The student leads the conversation by following English prompts (e.g., “Explain your travel plans”).
The instructor follows prompts in the target language to guide the conversation, but now both participants are improvising.
Focus: Building the ability to navigate everyday situations and handle less predictable responses.

🔸 C Level – Advanced (Fully Improvised, High-Level Conversations)

These conversations are complex, nuanced, and resemble real-life discussions.

There are two conversation types:

Instructor Starts:
Both student and instructor follow target-language prompts (e.g., “Express surprise”, “Invite the other person to develop an idea”).
No English is used in the conversation setup.
Student Starts:
The student leads using target-language prompts to achieve communicative goals.
The instructor follows improvisation cues also written in the target language.
Focus: Encouraging spontaneous, natural conversation with sophisticated language, similar to what’s expected in AP-level or advanced real-world exchanges.

The directed conversation app can be used with one or a group of students. This app has many tools to facilitate either interaction, presumably over a video conferencing app.

As students complete their turn, the teacher can mark the line complete so that both can keep track of where they are in conversation.
If it is an assessment, the teacher can score the student’s turn.
If it is a practice and teachers enter new words or phrases that students request into the “useful words” text field, it is possible to make flashcards practice right away.
If it is a class, students can be sent a join link and can be prompted to record something of what they hear in their classmates’ response.

Give our directed conversation (conversations dirigée) app a try today!

Structured AI Chat at Innovation

Innovation is proud to introduce its newest learning tool: structured AI chat.

We created this feature to empower students to practice conversations and engage with course material outside of class time. Although originally designed for world language learners, our AI chat works beautifully across disciplines, making it a versatile resource for content-based courses as well.

At Innovation, we believe technology should enhance learning in structured, meaningful ways. We call our applications “21st-century learning spaces“—they’re carefully designed to meet educational best practices and support student growth.

Teachers have two powerful ways to use AI chat: hosted or hostless.

The AI’s responses follow strict, teacher-defined parameters.

Hosted Chat: This mirrors Innovation’s original synchronous chat app where teachers facilitated real-time conversations between students. The key difference? Now, students can be paired with AI personas instead of classmates.

Hostless Chat: These are independent, self-paced chat assignments that students can complete on their own. But they’re not free-for-alls—the guardrails are still firmly in place:

The chat transcript is automatically recorded.

Students have a limited number of turns with the AI.

When teachers set up a chat, they set the boundaries of the conversation by limiting its length and defining the AI persona’s role.

Students are always responsible to start the conversation. When the chat starts, the assignment is clear and the AI’s interaction protocols are clearly stated.

The AI persona will keep the student on track even if they attempt to distract it with irrelevant questions.

Here, the student tried to distract by asking about sports and the AI brought the discussion back to task.

Once completed, Innovation provides an app to evaluate the quality of the student’s interactions in the chat.

This summer, I will add an AI grading assistant to help assess student work in a chat.

While this was initially envisioned for language students, its use in teaching content became clear. In this example, we set up these parameters for the AI persona:

By the way, teachers can optionally include “accessories” such as a PDF article or a video for students to review before discussion.

Here is how our chat with the Ai started for a critical discussion of the causes of the French revolution.

Our sample conversation went on like this:

Ever true to the parameters set for it, the AI persists in challenging the student to think more deeply and clearly define their points.

The AI chat feature at Innovation has great potential to enrich assignments and promote critical thinking in content courses and linguistic fluency in language classes. Try it out!

AI Tokens

Users may have noted the new AI Dashboard in their control panel at Innovation and the new pricing tiers that reference “AI tokens”. What are AI tokens and what can you do with them?

Innovation is integrating AI into every one of its applications now. It is not only a place to teach and learn, it is now a place to create high quality resources to support teaching and learning!

Subscribers to Innovation now have a certain allotment of tokens per month. A “token” is a fundamental unit of text that large language models (LLMs) use to process and generate language. It can be a word, a part of a word, or even a punctuation mark. When you interact with an AI model, your input (prompts) and the model’s output (responses) are broken down into tokens. The cost of using AI services is directly tied to the number of tokens processed. Generally, the more tokens used, the higher the cost. Because Innovation pays per token used via OpenAI, your monthly token allotment is designed to balance value and cost in a fair, transparent way.

During development, I maintained a logging script to see how many tokens I used to complete the various tasks. Unsurprisingly, essay grading is our most token-intensive task, averaging around 1881 tokens per interaction.
Vocabulary List Generation is the least token-intensive, consuming significantly fewer tokens.

Of course, token usage varies widely. Some teachers provide detailed outlines when asking the AI to generate tests or discussion prompts based on video or reading material. Others might use the AI heavily for grading essays or enabling student chat discussions. Your usage will shape how far your tokens go.

At the “pro” tier, you get 100,000 AI tokens / month. This resets every 30 days from the date you have a paid subscription. So what can you do with that? Well, based on my own usage (remember, I teach remotely part-time out of Innovation myself!) …

So, what does 100,000 AI tokens actually look like in practice for a teacher? Well, based on my own usage, it’s quite a lot of creative power at your fingertips! For example, you could grade around 53 essays (that’s right, those token-intensive ones!), or generate over 130 sets of test questions for your classes. Need a quick conversation starter for a foreign language class or a debate prompt? You could generate almost 180 conversations. And if you’re building vocabulary, you’re in luck – you could create an incredible 877 vocabulary lists with that many tokens! It really opens up a world of possibilities for creating high-quality teaching and learning resources.

Current AI-Integrations

I have completed the first round of our AI integrations! Here are some “how-to” guides for what we have so far.

Ethical AI for Instructors

An article in the New York Times caught my eye yesterday. The Professors Are Using ChatGPT, and Some Students Aren’t Happy About It read the title. For the past six weeks, I have been coding AI integrations into Innovation. It caught my eye because I have been thinking a lot about AI in education. From the perspective of a teacher, it drives me crazy when my students submit ChatGPT-generated work and pass it off as their own. The cartwheels I have to do as a remote instructor to prevent this are pretty byzantine!

But I am also interested as a businessman. I aim to enliven innovation (and raise its notoriety) by the integration of OpenAI in every aspect of the site. During this feverish coding period since mid-April when we got our API key, I have coded apps that…

generate multiple-choice questions for tests, reading comprehension, videos, and Jeopardy-style games;
score short answer responses based on guidelines and model answers;
score longer essays based on rubrics and instructor-designed guidelines;
interact with students in online forum discussions;
generate composition topics and dictée practices for world language teachers;
generate custom grammar exercises for world language instructors.

Through the summer, I plan to add some sophisticated AI analysis options for student essays as well as rubric generators and a monitored chat.

In the New York Times article, student Ella Stapleton was a senior at Northeastern University. Her professor had used ChatGPT to generate lecture notes and failed to remove the telltale signs of its origin. Another student found that the comments a professor left on one of her assignments included the chat with an AI to help grade it. One student is suing her university, saying she was paying for instruction from the prof and not from an AI. Are they right to be annoyed?

Readers are no doubt familiar with the Talmud, a central work of Jewish thought composed of rabbinic debates spanning centuries. These debates often wrestle with how to interpret and apply biblical law to real or hypothetical situations. A hallmark of Talmudic reasoning is the use of analogy: to what extent does a current case resemble one already discussed and resolved?

This is the approach I would like to take in arguing specific ethical considerations regarding the use of AI by instructors. I began teaching in 1991. If we assume ethical principles to be fairly static, since right and wrong should probably not really change much, then what was right then is still right now.

In 1991, a public school teacher would have access to a commercially published textbook. This would typically come with a package of pre-made tests and answer keys, workbooks for subject-specific practice, maybe filmstrips or posters, and so forth. It was the common understanding that teachers were not expected to write their own textbooks or even design every one of their own lesson activities.

In 1991, a college professor would typically teach using a commercially published textbook selected for the course. Along with the textbook came instructor guides, test banks, lecture slides, and other supplemental materials provided by the publisher. Professors might adapt these resources, but it was generally understood that they were not expected to create every reading, assignment, or exam from scratch. The role of the professor centered more on guiding discussion, delivering lectures, and evaluating student work than on developing entirely original curricula for each course.

With regard to assessment, my teachers in the 1970s in my grammar school sometimes used a Scantron machine to score those tests where you fill in the bubble. They did not score the tests all by hand. My elementary classes were 35-40 kids to a class in a parochial inner-city school.

When I was teaching social studies here in New York State just before I retired, I was called upon each June to drive far away to meet with colleagues from other districts to score the essay portions of the New York State Regents exams. Two teachers graded each paper and we discussed the merits and the score.

In 1991, assessment at the college level often meant midterms, finals, and a handful of major papers or projects. In large lecture courses, teaching assistants might handle the grading of essays, quizzes, or lab reports, following rubrics or guidelines set by the professor. While professors were ultimately responsible for student evaluation, it was common for them to delegate portions of the grading process, especially in high-enrollment classes. The expectation wasn’t that every piece of student work would receive personalized feedback from the lead instructor, but rather that grading would be efficient, consistent, and scalable.

Returning to the students who are upset with their professors for using AI to generate lecture notes or to generate student evaluations, I think we can reason by analogy as did those Talmudic scholars in times past to ascertain what is right.

My premise, and this is after many hours of working with AI over a year or more, is that at this particular moment in history, the best AI has to offer is to be a rather naive, but sometimes insightful, young assistant. My teachers reviewed the commercially published tests and checked for typos and accurate keys. My professors supervised their teaching assistants, providing them guidelines and checking their work. My AI helpers, who at the moment are ChatGPT and Gemini, need guidance and supervision by me.

Commercially published textbooks, tests, workbooks, worksheets, and the like have been acceptable and welcomed for a century. No one would have asked the one room schoolhouse teacher to publish her own grammar books. No one would have faulted a full professor for having his assistant grade lab reports. In 1991, and this is before the demands of differentiating instruction, the teacher was the creative director of a plan to educate using resources that they had vetted and sometimes using assistants that they supervised. At the time, this arrangement was both normal and uncontroversial.

The introduction of AI as a source of learning or an assessment tool doesn’t diminish the instructor’s crucial role; it amplifies it in the same way a carpenters’ work was amplified by the invention of the nail gun. Just as educators have always been responsible for the quality and integrity of their classrooms, they must now extend that vigilance to AI. This active supervision ensures that AI enhances, rather than supplants, sound pedagogical practices.

Innovation has built all of its AI integrations around a clear philosophy: the instructor remains the expert in the loop. When AI generates test questions, they must be approved by the instructor before being added to an assessment. When AI scores an essay, the instructor sets the rubric, defines the guidelines, and reviews the results before incorporating any of them into the student’s grade. When AI participates in student discussions, it does so within parameters the instructor has defined — including tone, context, and purpose — and under active supervision. When AI grades short-answer responses, it relies on model answers the instructor has already selected and endorsed.

At every turn, Innovation’s workflow puts the instructor in the role of guide and gatekeeper — promoting good old-fashioned professional oversight through the design itself.

The profs who failed to properly read and edit the course materials or assessment comments are to be chided for editing poorly. But the expectation that students have that instructors be the author of all of their course materials is born of an age when technology makes this at least theoretically possible, although not practically so. The expectation that no assessment will be outside the hand of the instructor is a new fashion, also imagined in a context of hyper-alertness to AI usage. One professor noted in the article was criticized by the student for chatting with the AI about writing the critique of the student’s work. But this is precisely what a professor might do with a live assistant in days gone by! The difference is that the student of the past would have no knowledge of the discussion.

One of my remote students this year had nothing good to say about one of her teachers. She cited the example of the fact that her teacher got her powerPoint slide shows from ChatGPT. If that powerPoint were of poor quality or included incorrect information, I could agree. Where this student goes wrong is in thinking that the general notion of getting learning resources elsewhere is illegitimate or unprecedented. The wrong would be in presenting shoddy or incorrect information, not in failing to be the author of everything.

✨ Let the AI Teaching Assistant Help you Generate Questions to Embed with Video and PDF.

Need comprehension or analysis questions for a PDF or a video? That can be incredibly time-consuming — but Innovation’s Teaching Assistant is here to help!

Just open the Étude app. Upload your PDF or paste your video embed code, then add it to your AI request configuration. In seconds, you’ll have high-quality questions based on your stimulus, crafted in the language and level of sophistication you choose.

Check it out and see how much time you’ll save!

✨ How To Score Writing Tasks Using AI

The AI Grading Assistant integrated into Innovation is a powerful tool designed to streamline the assessment of student writing tasks.

With just a click, you can apply one of the pre-installed rubrics or upload and use your own custom rubric. After a brief processing time, you’ll receive a detailed second opinion to help you balance and validate your own evaluation of the student’s work. The AI’s assessment is based both on the selected rubric criteria and on the advanced capabilities of a large generative AI model. As of this writing, Innovation uses GPT-4o for essay scoring, ensuring fast, consistent, and thoughtfully reasoned feedback.

Updating our Terms of Service for the AI Integration Rollout

We are excited to be rolling out our massive upgrade to AI this month! Already, subscribers will notice the little purple buttons all over the site controls offering AI assistance with test question generation, grading student work, and tasks specifically geared toward teaching modern languages.

Subscribers will be invited to agree to the new terms of service when everything is up and running in June. Here is the text of that change:

AI-ENABLED FEATURES AND RESPONSIBILITIES

Innovation Assessments LLC now provides access to a variety of artificial intelligence (AI)–powered tools to enhance educational services. These may include, but are not limited to, automated test question generation, grading support for essays and short answers, rubric design assistance, writing prompt creation, and supervised chat-based discussion with AI for students. Users should be aware that AI-generated content may contain inaccuracies or reflect inherent biases, and human oversight is crucial.

Use of AI services is subject to the following terms:

Students may only access AI-powered chat or discussion tools under licensing of their teacher, and only if the teacher has enabled this feature for their activity.
All AI-generated content is provided “as-is” and may require human review. Teachers are responsible for reviewing all materials prior to use in assessments or instruction.
Essay grading by AI is advisory in nature. Final evaluation remains at the discretion of the teacher or institution.
Teachers and students may not use the platform’s AI features to submit or generate content that is harmful, discriminatory, or in violation of academic integrity policies (which include, but are not limited to, plagiarism and unauthorized assistance).
Innovation Assessments LLC reserves the right to monitor, restrict, or disable AI usage in cases of misuse, abuse, or usage patterns that negatively impact the platform’s performance or other users.

Innovation Assessments LLC will handle data generated through AI features in accordance with its Privacy Policy. Innovation Assessments LLC may update or modify its AI-powered features and functionalities over time. By using any AI-related features, you acknowledge the limitations of current AI technology and agree not to rely solely on AI-generated outputs for high-stakes educational decisions.

AI TOKEN USAGE

Access to AI features is governed by a monthly token system. Each account tier includes a set number of AI tokens per month, which may be used for supported features such as question generation, AI chat, grading support, and other automated tools. Tokens renew every 30 days from the date of paid subscription.

AI tokens do not roll over. Unused tokens expire at the end of each 30-day cycle.
Users may purchase additional token bundles if their monthly allotment is exhausted before renewal.
Token usage is calculated by the AI company and may vary based on the feature and the amount of text processed in both the request and the response. Requests with more text will consume more tokens, as will more detailed or lengthy AI-generated content. Higher-cost actions (e.g., full essay scoring) consume more tokens.
Token balances and consumption details are available to account administrators within the platform dashboard.
It is the user’s responsibility to monitor token use and purchase additional tokens as needed.

Innovation Assessments LLC reserves the right to modify token costs, tier allowances, or features covered by tokens with notice. Abuse of the token system may result in service restrictions or termination.

✨ Make a Jeopardy Game with Innovation’s AI Integration!

My students have always loved playing Jeopardy! Oh, sorry, trademark issue… I mean “Jeopardy-like trivia games in class”.. 😏

Our game is called “Ventura”.

Innovation has had a fantastic app for generating such games for years now. As part of our integration of AI into our whole system, teachers can now employ our AI teaching assistant in generating Jeopardy games!

Just like for creating test questions, teachers configure the request to OpenAI in the Ventura game.

Use the teaching assistant to generate questions. Use them as-is or edit them. Add images or audio clips!

Holy cow, I remember the old days back in the 1990s when I would use PowerPoint to make a Jeopardy game for review day. It took a really long time to enter all the questions and answers even when I had a template game prepared!

Now I can make a game in 2-3 minutes! The test generator is using one of OpenAI’s contemporary models, so you can rely on the question quality.

Enjoy!