Evaluation
- Langsmith article really good - https://docs.smith.langchain.com/concepts/evaluation
Karpathy tweet
- Karpathy notes on Apple via tweet 11/06/24 Actually, really liked the Apple Intelligence announcement. It must be a very exciting time at Apple as they layer AI on top of the entire OS. A few of the major themes.
-Step 1 Multimodal I/O. Enable text/audio/image/video capability, both read and write. These are the native human APIs, so to speak. -Step 2 Agentic. Allow all parts of the OS and apps to inter-operate via "function calling"; kernel process LLM that can schedule and coordinate work across them given user queries. -Step 3 Frictionless. Fully integrate these features in a highly frictionless, fast, "always on", and contextual way. No going around copy pasting information, prompt engineering, or etc. Adapt the UI accordingly. -Step 4 Initiative. Don't perform a task given a prompt, anticipate the prompt, suggest, initiate. -Step 5 Delegation hierarchy. Move as much intelligence as you can on device (Apple Silicon very helpful and well-suited), but allow optional dispatch of work to cloud. -Step 6 Modularity. Allow the OS to access and support an entire and growing ecosystem of LLMs (e.g. ChatGPT announcement). -Step 7 Privacy. <3
-We're quickly heading into a world where you can open up your phone and just say stuff. It talks back and it knows you. And it just works. Super exciting and as a user, quite looking forward to it.
Prompting
Articles
- O'Reilly Article on Prompting
- Eugen Yan on Prompting
- Andrew Ng on Prompting
- So you think you can prompt
Strategies
Books
Evaluation
Tools
- Evaluation Links
- Eugen Yan on Evaluation
- List of Evaluation/Observability Tooling
- TinyLLM seems particularly interesting
- Logfire from Pydantic
- Logfire Blog Post
Papers
Engineering and Development
Iterative Loop
Libraries
- Cameron Wolfe on Prompt Engineering and RAG
- Jeremiah Howard Resources
- LM Hackers Notebook
- Jeremiah Howard Tweet
- Lumentis Library
- Marker Library
Education
Papers
Blogs
Safety and Standards
Safety Guidelines
Standards
AI Product Strategy and Portfolio
Strategy Documents
Tools and Resources
Reflection Tools
Instructor Use Cases
Knowledge Graphs
GPT Researcher
Companies to Watch
Routing and Agentic RAG
- Leah Bonser on AI Planner Agent
- Leah also wrote a second post.
- LLama Index Router Stuff
- Signed up to the Course in DeepLearning.ai
Miscellaneous
AI agent infrastructure
- https://www.madrona.com/the-rise-of-ai-agent-infrastructure/
Goals
Agent Collusion
Consulting Proposal
McKinsey State of AI
AI Speakers
Shadow AI
- This keeps coming up as a thing.
Few-Shot Learning
Perplexity Pages
O'Reilly Article on prompting
https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/
https://t.co/UT3pqK9uK67
https://applied-llms.org/
Evaluation links
- https://hamel.dev/blog/posts/evals/
- https://eugeneyan.com/writing/evals/
Think I need a page just on evaluation
Prompting
- https://eugeneyan.com/writing/prompting/
Goals
- https://www.loom.com/share/d0ff7c0b17b34aa7b46f9537c5b25785?start_download=true
I really like this idea of taking two companies and making the "Goals" [As I see it] relevant from one vendor to a potential client to help in sales calls
Engineering
Hrishi Olickel's iterative loop of Chat, Play, Loop, and Nest6 7 guides alot of my thinking on how to build agentic workflows. In general, the build principles I follow are:
- Pipelines > Prompts
- Inputs > Outputs
- APIs > Abstractions
- Multimodal > Text
The table below collects more specific technique tips and tricks - summarised from Hrishi's talks - across the build cycle.
Stage | Techniques |
---|---|
Planning & Design | - Iterative Loop: • Chat: Explore options • Play: Edit in OpenAI playground (60% time) • Loop: Add data and test cases • Nest: Simplify with subtasks - Time Allocation: • Playing: 60% • Prompt Tuning: 20% • Input Massaging: 10% • Coding: 10% • Tooling: 1% |
Development | - Dos: • Use all modalities • Code for input and output structure • Leverage pretraining information • Reduce search space upfront - Don'ts: • Add abstractions between yourself and the LLM • Stick to one model • Have too high an I/O ratio |
Testing & Debugging | - Debugging: • Start at the prompt level or try a different model • Transform input and add more structure to output • Classify tasks and errors to identify where the failure is - Optimizing AI: • Break prompts to reduce complexity • Use separate models for structure and long-form writing • Implement state management and self-healing |
Deployment & Scaling | - Future Planning: • Assume everything will be 10x to 50x cheaper and faster in the future • Build for at least 6 months ahead - AI Resources: • Be mindful of the exponential scaling of attention algorithms |
Andrew Ng
https://www.deeplearning.ai/the-batch/issue-249/
When building complex workflows, I see developers getting good results with this process:
Write quick, simple prompts and see how it does. Based on where the output falls short, flesh out the prompt iteratively. This often leads to a longer, more detailed, prompt, perhaps even a mega-prompt. If that’s still insufficient, consider few-shot or many-shot learning (if applicable) or, less frequently, fine-tuning. If that still doesn’t yield the results you need, break down the task into subtasks and apply an agentic workflow.
prompting strategies : https://arxiv.org/abs/2311.16452?utm_campaign=The%20Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-_PU4gmbfJN9_gBrzLMkZheDB1ROQnQWYv9cSxeMK53CO9ix0aYRLcabOd6v3xmmbHcM7HE
Hrishi Budget
https://olickel.com/object-oriented-large-language-models
Cameron Wolfe
Many useful ideas from his writings around prompt engineering and RAG https://cameronrwolfe.substack.com/
Reflection
The reflection Tool at its simplest is about thinking about what was done and trying to improve it. Evaluation is a subset of reflection. or maybe the other way around. Here's a great list of evaluation/observability tooling available
- https://ianww.com/llm-tools
- tinyllm seems particularly interesting to me
I'd also want to consider and think about logfire from Pydantic :
- https://pydantic.dev/logfire
- https://python.useinstructor.com/blog/2024/05/01/instructor-logfire/
An evaluation paper
- https://arxiv.org/abs/2404.12272
Instructor use cases I want to explore
- https://x.com/ddebowczyk/status/1792105564966662523
-
https://x.com/jxnlco/status/1788558053094117691 [Some rag thing I need to review]
-
In the docs there is a create iterable method instead of Iterable[T] - https://github.com/jxnl/instructor?tab=readme-ov-file#streaming-iterables-create_iterable
Other librarires I 'm thinking through
CSIRO links on Safety
- https://research.csiro.au/ss/team/se4ai/responsible-ai-engineering/
- https://research.csiro.au/ss/team/se4ai/
Routing [Probably more urgent that I read the Leah Post]
-
https://www.linkedin.com/pulse/how-create-good-ai-planner-agent-leah-bonser-mvrsf/?trackingId=nMqml7p%2BTVK2nhzyPLpsxQ%3D%3D
-
Note Leah also wrote a second post.
Jeremy Howard Stuff I need to read
- https://www.youtube.com/watch?v=jkrNMKz9pWU
- https://github.com/fastai/lm-hackers/blob/main/lm-hackers.ipynb
- https://x.com/HamelHusain/status/1793319488731107718
LLama index router stuff
- https://medium.com/aimonks/agentic-rag-with-llama-index-router-query-engine-01-381e83a418af
- Have signed up to the course in deeplearning.ai
AI Product Strategy for portfolio
Ton of awesome AI speakers
https://x.com/HamelHusain/status/1792223793903276343
Prompt book coming out
- https://www.oreilly.com/library/view/prompt-engineering-for/9781098156145/
Agent collusion
https://arxiv.org/abs/2404.00806
Hamel Husain
Consulting proposal from a meeting : https://gist.github.com/hamelsmu/ac72d18ee9d4cbd6a235a8e37a75f303
Australian safety standards
- https://www.cyber.gov.au/resources-business-and-government/governance-and-user-education/artificial-intelligence/deploying-ai-systems-securely
Education
https://towardsdatascience.com/ai-knocking-on-the-classrooms-door-87db39d00b94
GPT - Researcher
- 1 : https://x.com/hu_yifei/status/1793751353308901394
- 2 : https://github.com/assafelovic/gpt-researcher
Education
-
Google Education Paper Worth reading. Long and dense : https://storage.googleapis.com/deepmind-media/LearnLM/LearnLM_paper.pdf
-
Donald Clark Blog - https://donaldclarkplanb.blogspot.com/
-
https://www.ai-supremacy.com/p/ai-in-education-googles-learnlm-product
Shadow AI
- this keeps coming up as a thing
Companies to watch
- https://www.studyfetch.com/
Few shot tool use doesn't really work yet
- https://research.google/blog/few-shot-tool-use-doesnt-really-work-yet/
Perplexity pages
- https://www.perplexity.ai/hub/faq/what-is-perplexity-pages
Knowledge graphs
- https://gradientflow.com/graphrag-design-patterns-challenges-recommendations/
Evaluation
- https://x.com/eugeneyan/status/1796300181186732522
McKinsey state of AI
https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2023-generative-ais-breakout-year
Aha moments for voice
Sure, here are some specific and niche examples of potential aha moments:
1. Product Manager's Design Sprint
A product manager is leading a design sprint. While walking, they interact with the LLM to brainstorm ideas, outline user personas, and sketch out user journeys. The LLM helps by offering creative suggestions, asking probing questions, and even generating wireframes. This turns what would be idle walking time into productive, ideation-rich sessions.
2. Medical Researcher Reviewing Clinical Trials
A medical researcher on their way to the lab uses their walk to review the latest clinical trial data. The LLM summarizes complex medical journals, highlights key findings, and answers specific queries about methodologies or results. This ensures that the researcher is well-prepared and updated, maximizing their time efficiently.
3. Job Seeker Practicing Industry-Specific Interviews
An IT professional seeking a new job uses their morning walks to practice for interviews. The LLM conducts mock interviews, focusing on technical questions related to the latest programming languages and frameworks. It also provides feedback on responses and suggests improvements, helping the job seeker feel more confident and prepared.
4. Startup Founder Refining a Pitch
A startup founder uses their evening walks to refine their pitch. The LLM assists by simulating investor questions, helping to craft compelling narratives, and suggesting data points to include. This iterative process helps the founder fine-tune their pitch deck and elevator speech, making them better prepared for investor meetings.
5. Musician Composing and Arranging Music
A musician on a stroll uses voice interactions with an LLM to compose and arrange music. They hum melodies, and the LLM helps to notate the music, suggest chord progressions, and even generate accompanying parts for other instruments. This transforms a simple walk into a creative session, leading to new compositions and arrangements.
6. Author Developing a Novel Plot
An author uses their walks to develop novel plots. The LLM assists by discussing character development, plot twists, and thematic elements. The author can voice their ideas, and the LLM offers constructive feedback, alternative scenarios, and even dialogue snippets. This keeps the author in a creative flow, turning walks into productive brainstorming sessions.
These examples illustrate how specific, detailed interactions with an LLM can turn otherwise idle walking time into highly productive and insightful sessions tailored to various professional and creative needs.
Idea - Voice enabled
- AI Powered design sprints with voice
- walkandtalk.ai
Pattern for building LLM applications hrishi
- https://simonwillison.net/2024/Apr/9/a-solid-pattern-to-build-llm-applications/
- https://youtu.be/8w0hUcQSDy8?si=VvwvHVuUv94aIzuu
Agent design patterns
- https://arxiv.org/abs/2405.10467
- agent reference architecture - https://arxiv.org/abs/2311.13148
- https://www.youtube.com/watch?v=MXPYbjjyHXc
recommendation systems
- https://www.buildingrecsys.com/
Advancing to agents
- https://www.youtube.com/watch?v=MXPYbjjyHXc
Prompt report taxonomy
https://arxiv.org/abs/2406.06608
Really interesting infrastructure market map
https://www.bvp.com/atlas/roadmap-ai-infrastructure
AI regulation by Jeremy Howard
https://www.answer.ai/posts/2024-06-11-os-ai.html
Langgraph
- Explanation - https://www.youtube.com/watch?v=hvAPnpSfSGo
Really good podcast on agents by Harrison Chase
https://www.youtube.com/watch?v=6XZLoW0-mPY&t
Spiral - BUsiness idea
- https://www.youtube.com/watch?v=iZw5GHuR9IY
- https://spiral.computer/
Notebooks
https://youtu.be/-kdl04xqasY
Live podcast on o'reilly LLMs
https://www.youtube.com/watch?v=c0gcsprsFig&t=2839s
Business Idea : Auto convert market map into a nice UI
https://app.dealroom.co/lists/46345
Claudette :
- https://www.youtube.com/watch?v=p_8Zk6HUCV8
- https://www.answer.ai/posts/2024-06-23-claudette-src.html
Lance DB with instructor
-https://x.com/Prashant_Dixit0/status/1804789578437722416
Australia COmpliance
- https://www.itnews.com.au/news/data-and-digital-ministers-agree-to-national-ai-framework-609028
- https://www.finance.gov.au/government/public-data/data-and-digital-ministers-meeting/national-framework-assurance-artificial-intelligence-government