Skip to content

Evaluation

  • Langsmith article really good - https://docs.smith.langchain.com/concepts/evaluation

Karpathy tweet

  • Karpathy notes on Apple via tweet 11/06/24 Actually, really liked the Apple Intelligence announcement. It must be a very exciting time at Apple as they layer AI on top of the entire OS. A few of the major themes.

-Step 1 Multimodal I/O. Enable text/audio/image/video capability, both read and write. These are the native human APIs, so to speak. -Step 2 Agentic. Allow all parts of the OS and apps to inter-operate via "function calling"; kernel process LLM that can schedule and coordinate work across them given user queries. -Step 3 Frictionless. Fully integrate these features in a highly frictionless, fast, "always on", and contextual way. No going around copy pasting information, prompt engineering, or etc. Adapt the UI accordingly. -Step 4 Initiative. Don't perform a task given a prompt, anticipate the prompt, suggest, initiate. -Step 5 Delegation hierarchy. Move as much intelligence as you can on device (Apple Silicon very helpful and well-suited), but allow optional dispatch of work to cloud. -Step 6 Modularity. Allow the OS to access and support an entire and growing ecosystem of LLMs (e.g. ChatGPT announcement). -Step 7 Privacy. <3

-We're quickly heading into a world where you can open up your phone and just say stuff. It talks back and it knows you. And it just works. Super exciting and as a user, quite looking forward to it.

Prompting

Articles

Strategies

Books

Evaluation

Tools

Papers

Engineering and Development

Iterative Loop

Libraries

Education

Papers

Blogs

Safety and Standards

Safety Guidelines

Standards

AI Product Strategy and Portfolio

Strategy Documents

Tools and Resources

Reflection Tools

Instructor Use Cases

Knowledge Graphs

GPT Researcher

Companies to Watch

Routing and Agentic RAG

Miscellaneous

AI agent infrastructure

  • https://www.madrona.com/the-rise-of-ai-agent-infrastructure/

Goals

Agent Collusion

Consulting Proposal

McKinsey State of AI

AI Speakers

Shadow AI

  • This keeps coming up as a thing.

Few-Shot Learning

Perplexity Pages


O'Reilly Article on prompting

https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/

https://t.co/UT3pqK9uK67

https://applied-llms.org/

  • https://hamel.dev/blog/posts/evals/
  • https://eugeneyan.com/writing/evals/

Think I need a page just on evaluation

Prompting

  • https://eugeneyan.com/writing/prompting/

Goals

  • https://www.loom.com/share/d0ff7c0b17b34aa7b46f9537c5b25785?start_download=true

I really like this idea of taking two companies and making the "Goals" [As I see it] relevant from one vendor to a potential client to help in sales calls

Engineering

Hrishi Olickel's iterative loop of Chat, Play, Loop, and Nest6 7 guides alot of my thinking on how to build agentic workflows. In general, the build principles I follow are:

  • Pipelines > Prompts
  • Inputs > Outputs
  • APIs > Abstractions
  • Multimodal > Text

The table below collects more specific technique tips and tricks - summarised from Hrishi's talks - across the build cycle.

Stage Techniques
Planning & Design - Iterative Loop:
• Chat: Explore options
• Play: Edit in OpenAI playground (60% time)
• Loop: Add data and test cases
• Nest: Simplify with subtasks
- Time Allocation:
• Playing: 60%
• Prompt Tuning: 20%
• Input Massaging: 10%
• Coding: 10%
• Tooling: 1%
Development - Dos:
• Use all modalities
• Code for input and output structure
• Leverage pretraining information
• Reduce search space upfront
- Don'ts:
• Add abstractions between yourself and the LLM
• Stick to one model
• Have too high an I/O ratio
Testing & Debugging - Debugging:
• Start at the prompt level or try a different model
• Transform input and add more structure to output
• Classify tasks and errors to identify where the failure is
- Optimizing AI:
• Break prompts to reduce complexity
• Use separate models for structure and long-form writing
• Implement state management and self-healing
Deployment & Scaling - Future Planning:
• Assume everything will be 10x to 50x cheaper and faster in the future
• Build for at least 6 months ahead
- AI Resources:
• Be mindful of the exponential scaling of attention algorithms

Andrew Ng

https://www.deeplearning.ai/the-batch/issue-249/

When building complex workflows, I see developers getting good results with this process:

Write quick, simple prompts and see how it does. Based on where the output falls short, flesh out the prompt iteratively. This often leads to a longer, more detailed, prompt, perhaps even a mega-prompt. If that’s still insufficient, consider few-shot or many-shot learning (if applicable) or, less frequently, fine-tuning. If that still doesn’t yield the results you need, break down the task into subtasks and apply an agentic workflow.

prompting strategies : https://arxiv.org/abs/2311.16452?utm_campaign=The%20Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-_PU4gmbfJN9_gBrzLMkZheDB1ROQnQWYv9cSxeMK53CO9ix0aYRLcabOd6v3xmmbHcM7HE

Hrishi Budget

https://olickel.com/object-oriented-large-language-models

Cameron Wolfe

Many useful ideas from his writings around prompt engineering and RAG https://cameronrwolfe.substack.com/

Reflection

The reflection Tool at its simplest is about thinking about what was done and trying to improve it. Evaluation is a subset of reflection. or maybe the other way around. Here's a great list of evaluation/observability tooling available

  • https://ianww.com/llm-tools
  • tinyllm seems particularly interesting to me

I'd also want to consider and think about logfire from Pydantic :

  • https://pydantic.dev/logfire
  • https://python.useinstructor.com/blog/2024/05/01/instructor-logfire/

An evaluation paper

  • https://arxiv.org/abs/2404.12272

Instructor use cases I want to explore

  • https://x.com/ddebowczyk/status/1792105564966662523
  • https://x.com/jxnlco/status/1788558053094117691 [Some rag thing I need to review]

  • In the docs there is a create iterable method instead of Iterable[T] - https://github.com/jxnl/instructor?tab=readme-ov-file#streaming-iterables-create_iterable

Other librarires I 'm thinking through

  • https://research.csiro.au/ss/team/se4ai/responsible-ai-engineering/
  • https://research.csiro.au/ss/team/se4ai/

Routing [Probably more urgent that I read the Leah Post]

  • https://www.linkedin.com/pulse/how-create-good-ai-planner-agent-leah-bonser-mvrsf/?trackingId=nMqml7p%2BTVK2nhzyPLpsxQ%3D%3D

  • Note Leah also wrote a second post.

Jeremy Howard Stuff I need to read

  • https://www.youtube.com/watch?v=jkrNMKz9pWU
  • https://github.com/fastai/lm-hackers/blob/main/lm-hackers.ipynb
  • https://x.com/HamelHusain/status/1793319488731107718

LLama index router stuff

  • https://medium.com/aimonks/agentic-rag-with-llama-index-router-query-engine-01-381e83a418af
  • Have signed up to the course in deeplearning.ai

AI Product Strategy for portfolio

Ton of awesome AI speakers

https://x.com/HamelHusain/status/1792223793903276343

Prompt book coming out

  • https://www.oreilly.com/library/view/prompt-engineering-for/9781098156145/

Agent collusion

https://arxiv.org/abs/2404.00806

Hamel Husain

Consulting proposal from a meeting : https://gist.github.com/hamelsmu/ac72d18ee9d4cbd6a235a8e37a75f303

Australian safety standards

  • https://www.cyber.gov.au/resources-business-and-government/governance-and-user-education/artificial-intelligence/deploying-ai-systems-securely

Education

https://towardsdatascience.com/ai-knocking-on-the-classrooms-door-87db39d00b94

GPT - Researcher

  • 1 : https://x.com/hu_yifei/status/1793751353308901394
  • 2 : https://github.com/assafelovic/gpt-researcher

Education

  • Google Education Paper Worth reading. Long and dense : https://storage.googleapis.com/deepmind-media/LearnLM/LearnLM_paper.pdf

  • Donald Clark Blog - https://donaldclarkplanb.blogspot.com/

  • https://www.ai-supremacy.com/p/ai-in-education-googles-learnlm-product

Shadow AI

  • this keeps coming up as a thing

Companies to watch

  • https://www.studyfetch.com/

Few shot tool use doesn't really work yet

  • https://research.google/blog/few-shot-tool-use-doesnt-really-work-yet/

Perplexity pages

  • https://www.perplexity.ai/hub/faq/what-is-perplexity-pages

Knowledge graphs

  • https://gradientflow.com/graphrag-design-patterns-challenges-recommendations/

Evaluation

  • https://x.com/eugeneyan/status/1796300181186732522

McKinsey state of AI

https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2023-generative-ais-breakout-year

Aha moments for voice

Sure, here are some specific and niche examples of potential aha moments:

1. Product Manager's Design Sprint

A product manager is leading a design sprint. While walking, they interact with the LLM to brainstorm ideas, outline user personas, and sketch out user journeys. The LLM helps by offering creative suggestions, asking probing questions, and even generating wireframes. This turns what would be idle walking time into productive, ideation-rich sessions.

2. Medical Researcher Reviewing Clinical Trials

A medical researcher on their way to the lab uses their walk to review the latest clinical trial data. The LLM summarizes complex medical journals, highlights key findings, and answers specific queries about methodologies or results. This ensures that the researcher is well-prepared and updated, maximizing their time efficiently.

3. Job Seeker Practicing Industry-Specific Interviews

An IT professional seeking a new job uses their morning walks to practice for interviews. The LLM conducts mock interviews, focusing on technical questions related to the latest programming languages and frameworks. It also provides feedback on responses and suggests improvements, helping the job seeker feel more confident and prepared.

4. Startup Founder Refining a Pitch

A startup founder uses their evening walks to refine their pitch. The LLM assists by simulating investor questions, helping to craft compelling narratives, and suggesting data points to include. This iterative process helps the founder fine-tune their pitch deck and elevator speech, making them better prepared for investor meetings.

5. Musician Composing and Arranging Music

A musician on a stroll uses voice interactions with an LLM to compose and arrange music. They hum melodies, and the LLM helps to notate the music, suggest chord progressions, and even generate accompanying parts for other instruments. This transforms a simple walk into a creative session, leading to new compositions and arrangements.

6. Author Developing a Novel Plot

An author uses their walks to develop novel plots. The LLM assists by discussing character development, plot twists, and thematic elements. The author can voice their ideas, and the LLM offers constructive feedback, alternative scenarios, and even dialogue snippets. This keeps the author in a creative flow, turning walks into productive brainstorming sessions.

These examples illustrate how specific, detailed interactions with an LLM can turn otherwise idle walking time into highly productive and insightful sessions tailored to various professional and creative needs.

Idea - Voice enabled

  • AI Powered design sprints with voice
  • walkandtalk.ai

Pattern for building LLM applications hrishi

  • https://simonwillison.net/2024/Apr/9/a-solid-pattern-to-build-llm-applications/
  • https://youtu.be/8w0hUcQSDy8?si=VvwvHVuUv94aIzuu

Agent design patterns

  • https://arxiv.org/abs/2405.10467
  • agent reference architecture - https://arxiv.org/abs/2311.13148
  • https://www.youtube.com/watch?v=MXPYbjjyHXc

recommendation systems

  • https://www.buildingrecsys.com/

Advancing to agents

  • https://www.youtube.com/watch?v=MXPYbjjyHXc

Prompt report taxonomy

https://arxiv.org/abs/2406.06608

Really interesting infrastructure market map

https://www.bvp.com/atlas/roadmap-ai-infrastructure

AI regulation by Jeremy Howard

https://www.answer.ai/posts/2024-06-11-os-ai.html

Langgraph

  • Explanation - https://www.youtube.com/watch?v=hvAPnpSfSGo

Really good podcast on agents by Harrison Chase

https://www.youtube.com/watch?v=6XZLoW0-mPY&t

Spiral - BUsiness idea

  • https://www.youtube.com/watch?v=iZw5GHuR9IY
  • https://spiral.computer/

Notebooks

https://youtu.be/-kdl04xqasY

Live podcast on o'reilly LLMs

https://www.youtube.com/watch?v=c0gcsprsFig&t=2839s

Business Idea : Auto convert market map into a nice UI

https://app.dealroom.co/lists/46345

Claudette :

  • https://www.youtube.com/watch?v=p_8Zk6HUCV8
  • https://www.answer.ai/posts/2024-06-23-claudette-src.html

Lance DB with instructor

-https://x.com/Prashant_Dixit0/status/1804789578437722416

Australia COmpliance

  • https://www.itnews.com.au/news/data-and-digital-ministers-agree-to-national-ai-framework-609028
  • https://www.finance.gov.au/government/public-data/data-and-digital-ministers-meeting/national-framework-assurance-artificial-intelligence-government