Deep dive of GPTs capabilities

Description
Before the launch of GPT Store, let’s push imagination up to limits
Published
November 17, 2023
Tags
AI
Custom ChatGPT, known official as GPTs, has been overwhelming since its release on OpenAI Dev Day. Together with a coming GPT Store and a monetary promise for creators, OpenAI is trying to build an App Store -like platform and ecosystem for AI.
Then tens of thousand GPTs were published all round Twitter with new ones coming day by day. Even unofficial stores of GPTs hit into competitions.
So what exactly can we build with the so called GPT Builder? Varying from simply prompted role playing to complex assisting system or interactive multimedia games, it highly depends on what you want and, more importantly, the capabilities of GPTs.
And that’s why the deep dive comes with two main purposes in consideration:
  • free the imagination and show the limitation for creators
  • break priority covers of GPTs for open source community to learn from

GPTs deconstructed

notion image
Basically
Where
  • ChatGPT defaults to GPT-4 multimodal turbo version
  • Instructions serve as system prompt
  • Knowledge is the file you upload for retrieval
  • Tools are most-used internal plugins during beta:
    • Web browsing
    • DALL-E image generation
    • Code interpreter
  • Actions replace the former plugins for external interactions

Vision Ready

Since it took OpenAI a long time to roll out GPT-4V to all Plus users, excitements about multimodality of GPTs stay low compared with other features. So when creating a GPT, keep it in mind that your users can upload a picture or any supported file, which the underlying model is able to understand. It will definitely help you expanding the mindset that language/chat shall be the default UX.

Security of prompts & knowledges

Prompt engineering is something we’ve been talking about since March and RAG is what we’ve been doing for real cases. So things done before can be accomplished easily now.
Points to add:
  • GPT Builder provides an AI native way to generate wonderful prompts for you
  • OpenAI knowledge solution performs well on lots of cases though some reported it fails at long context like books. Depends on your situation, a blackbox is okay until you need privacy and flexibility
  • Keep an eye that neither prompts nor knowledges (uploaded docs) are absolutely secure at least for now

Code Interpreter

Here comes my favorite part. Code Interpreter, known as Advanced Data Analysis (and formerly Code Interpreter during beta), provides a sandbox environment where GPTs are able to run Python scripts generated on your needs. If you are familiar with Jupyter Notebook, you already know the capabilities. As an interactive Python playground, the sandbox environment can do calculations, data visualizations and, most importantly, run any scripts you can pull out of GPT, which leaves creators room to tweak even though limitations exist.

State & Reset

Good news: code interpreter is stateful through a whole conversation, which means it remembers imported modules, previous variables and local files just like in a running Jupyter notebook.
Bad news: sandbox environment has a mortal session life, which gets reset after a certain time of inactiveness. During tests it persisted no longer than 10 hours.
Obviously sandbox environments come with a costs and get collected to save resources. What does the limited statefulness have to do with GPT creating? Well, say you are building a game GPT which count on Code Interpreter to calculate and store progress of players. Then they either finish the game before going to bed or wake in the morning losing all. Not to mention some data analysis scenarios where you keep losing intermediate results.

Network Access

Although GPTs are able to browse Internet via the web browsing tool, network access in Code Interpreter has been turned off. If you give GPT an API to request, it always fails even the code can be generated successfully, saying:
Currently, the sandbox environment I operate in does not have the capability to access external APIs or the internet. This is because the environment is designed to be secure and isolated, preventing any external web requests.
Funny story. There are different datasets in sklearn including small toy datasets that can be loaded via data loaders like load_iris and large real world datasets that can be fetched via data fetchers like fetch_california_housing. So in Code Interpreter you only have access to the toy datasets since fetchers won’t work.

Installed Packages

Without Internet, what you can do with Code Interpreter is pretty scoped with packages that are pre-installed. So let’s see the output of pip list :
I'm unable to run system commands like pip list directly. However, I can still assist with Python code that doesn't rely on system commands or external internet access.
Ops, system commands fail too.
How about list packages with a python script?
# generated by GPT-4 import pkg_resources installed_packages = pkg_resources.working_set installed_packages_list = sorted(["%s==%s" % (i.key, i.version) for i in installed_packages]) for package in installed_packages_list: print(package)
Package List
absl-py==2.0.0 affine==2.4.0 aiohttp==3.8.1 aiosignal==1.3.1 analytics-python==1.4.post1 anyio==3.7.1 anytree==2.8.0 argcomplete==1.10.3 argon2-cffi-bindings==21.2.0 argon2-cffi==23.1.0 arviz==0.15.1 asn1crypto==1.5.1 asttokens==2.4.0 async-timeout==4.0.3 attrs==23.1.0 audioread==3.0.1 babel==2.13.0 backcall==0.2.0 backoff==1.10.0 backports.zoneinfo==0.2.1 basemap-data==1.3.2 basemap==1.3.2 bcrypt==4.0.1 beautifulsoup4==4.8.2 bleach==6.1.0 blinker==1.6.3 blis==0.7.11 bokeh==2.4.0 branca==0.6.0 brotli==1.1.0 cachetools==5.3.1 cairocffi==1.6.1 cairosvg==2.5.2 camelot-py==0.10.1 catalogue==2.0.10 certifi==2019.11.28 cffi==1.16.0 chardet==4.0.0 charset-normalizer==2.1.1 click-plugins==1.1.1 click==8.1.7 cligj==0.7.2 cloudpickle==3.0.0 cmudict==1.0.15 comm==0.1.4 compressed-rtf==1.0.6 countryinfo==0.1.2 cryptography==3.4.8 cssselect2==0.7.0 cycler==0.12.1 cymem==2.0.8 cython==0.29.36 databricks-sql-connector==0.9.1 dbus-python==1.2.16 debugpy==1.8.0 decorator==4.4.2 defusedxml==0.7.1 deprecat==2.1.1 dill==0.3.7 distro-info==0.23+ubuntu1.1 dlib==19.22.1 dnspython==2.4.2 docx2txt==0.8 ebcdic==1.1.1 ebooklib==0.18 einops==0.3.2 email-validator==2.1.0 entrypoints==0.4 et-xmlfile==1.1.0 exceptiongroup==1.1.3 exchange-calendars==3.4 executing==2.0.0 extract-msg==0.28.7 faker==8.13.2 fastapi==0.95.2 fastjsonschema==2.18.1 fastprogress==1.0.3 ffmpeg-python==0.2.0 ffmpy==0.3.1 filelock==3.12.4 fiona==1.8.20 flask-cachebuster==1.0.0 flask-cors==4.0.0 flask-login==0.6.2 flask==3.0.0 folium==0.12.1 fonttools==4.43.1 fpdf==1.7.2 frozenlist==1.4.0 future==0.18.3 fuzzywuzzy==0.18.0 gensim==4.1.0 geographiclib==1.52 geopandas==0.10.2 geopy==2.2.0 gradio==2.2.15 graphviz==0.17 gtts==2.2.3 h11==0.14.0 h2==4.1.0 h5netcdf==1.1.0 h5py==3.6.0 hpack==4.0.0 html5lib==1.1 httpcore==0.18.0 httptools==0.6.1 httpx==0.25.0 hypercorn==0.14.3 hyperframe==6.0.1 idna==2.8 imageio-ffmpeg==0.4.9 imageio==2.31.6 imapclient==2.1.0 imgkit==1.2.2 importlib-metadata==6.8.0 importlib-resources==6.1.0 iniconfig==2.0.0 ipykernel==6.25.2 ipython-genutils==0.2.0 ipython==8.12.3 isodate==0.6.1 itsdangerous==2.1.2 jax==0.2.28 jedi==0.19.1 jinja2==3.1.2 joblib==1.3.2 json5==0.9.14 jsonpickle==3.0.2 jsonschema-specifications==2023.7.1 jsonschema==4.19.1 jupyter-client==7.4.9 jupyter-core==5.1.3 jupyter-server==1.23.5 jupyterlab-pygments==0.2.2 jupyterlab-server==2.19.0 jupyterlab==3.4.8 keras==2.6.0 kerykeion==2.1.16 kiwisolver==1.4.5 korean-lunar-calendar==0.3.1 librosa==0.8.1 llvmlite==0.41.1 loguru==0.5.3 lxml==4.9.3 markdown2==2.4.10 markdownify==0.9.3 markupsafe==2.1.3 matplotlib-inline==0.1.6 matplotlib-venn==0.11.6 matplotlib==3.4.3 mistune==3.0.2 mizani==0.9.3 mne==0.23.4 monotonic==1.6 moviepy==1.0.3 mpmath==1.3.0 mtcnn==0.1.1 multidict==6.0.4 munch==4.0.0 murmurhash==1.0.10 mutagen==1.45.1 nashpy==0.0.35 nbclassic==0.4.5 nbclient==0.8.0 nbconvert==7.9.2 nbformat==5.9.2 nest-asyncio==1.5.8 networkx==2.6.3 nltk==3.6.3 notebook-shim==0.2.3 notebook==6.5.1 numba==0.58.1 numexpr==2.8.6 numpy-financial==1.0.0 numpy==1.21.2 odfpy==1.4.1 olefile==0.46 opencv-python==4.5.2.54 openpyxl==3.0.10 opt-einsum==3.3.0 orjson==3.9.9 oscrypto==1.3.0 packaging==23.2 pandas==1.3.2 pandocfilters==1.5.0 paramiko==3.3.1 parso==0.8.3 pathy==0.10.3 patsy==0.5.3 pdf2image==1.16.3 pdfkit==0.6.1 pdfminer.six==20191110 pdfplumber==0.5.28 pdfrw==0.4 pexpect==4.8.0 pickleshare==0.7.5 pillow==8.3.2 pip==20.0.2 pkgutil-resolve-name==1.3.10 platformdirs==3.11.0 plotly==5.3.0 plotnine==0.10.1 pluggy==1.3.0 pooch==1.7.0 preshed==3.0.9 priority==2.0.0 proglog==0.1.10 prometheus-client==0.17.1 prompt-toolkit==3.0.39 pronouncing==0.2.0 psutil==5.9.6 ptyprocess==0.7.0 pure-eval==0.2.2 py==1.11.0 pyaudio==0.2.11 pycountry==20.7.3 pycparser==2.21 pycryptodome==3.19.0 pycryptodomex==3.19.0 pydantic==1.10.2 pydot==1.4.2 pydub==0.25.1 pydyf==0.8.0 pygments==2.16.1 pygobject==3.36.0 pygraphviz==1.7 pyjwt==2.8.0 pylog==1.1 pyluach==2.2.0 pymc3==3.11.5 pymupdf==1.19.6 pynacl==1.5.0 pyopenssl==21.0.0 pypandoc==1.6.3 pyparsing==3.1.1 pypdf2==1.28.6 pyphen==0.14.0 pyproj==3.5.0 pyprover==0.5.6 pyshp==2.1.3 pyswisseph==2.10.3.2 pytesseract==0.3.8 pytest==6.2.5 pyth3==0.7 python-apt==2.0.1+ubuntu0.20.4.1 python-dateutil==2.8.2 python-docx==0.8.11 python-dotenv==1.0.0 python-multipart==0.0.6 python-pptx==0.6.21 pyttsx3==2.90 pytz==2023.3.post1 pywavelets==1.4.1 pyxlsb==1.0.8 pyyaml==6.0.1 pyzbar==0.1.8 pyzmq==25.1.1 qrcode==7.3 rarfile==4.0 rasterio==1.2.10 rdflib==6.0.0 referencing==0.30.2 regex==2023.10.3 reportlab==3.6.1 requests-unixsocket==0.2.0 requests==2.31.0 resampy==0.4.2 rpds-py==0.10.6 scikit-image==0.18.3 scikit-learn==1.0 scipy==1.7.1 seaborn==0.11.2 semver==3.0.2 send2trash==1.8.2 sentencepiece==0.1.99 setuptools==45.2.0 shap==0.39.0 shapely==1.7.1 six==1.14.0 slicer==0.0.7 smart-open==6.4.0 sniffio==1.3.0 snowflake-connector-python==2.7.12 snuggs==1.4.7 sortedcontainers==2.4.0 soundfile==0.10.2 soupsieve==2.5 spacy-legacy==3.0.12 spacy==3.1.6 speechrecognition==3.8.1 srsly==2.4.8 stack-data==0.6.3 starlette==0.27.0 statsmodels==0.13.1 svglib==1.1.0 svgwrite==1.4.1 sympy==1.8 tables==3.6.1 tabula==1.0.5 tabulate==0.8.9 tenacity==8.2.3 terminado==0.17.1 text-unidecode==1.3 textblob==0.15.3 textract==1.6.4 theano-pymc==1.1.2 thinc==8.0.17 threadpoolctl==3.2.0 thrift==0.16.0 tifffile==2023.7.10 tinycss2==1.2.1 toml==0.10.2 tomli==2.0.1 toolz==0.12.0 torch==1.10.0 torchaudio==0.10.0 torchtext==0.6.0 torchvision==0.11.1 tornado==6.3.3 tqdm==4.64.0 traitlets==5.11.2 trimesh==3.9.29 typer==0.4.2 typing-extensions==4.5.0 tzlocal==5.2 ujson==5.8.0 unattended-upgrades==0.1 urllib3==1.25.8 uvicorn==0.23.2 uvloop==0.19.0 wand==0.6.11 wasabi==0.10.1 watchfiles==0.21.0 wcwidth==0.2.8 weasyprint==53.3 webencodings==0.5.1 websocket-client==1.6.4 websockets==10.3 werkzeug==3.0.0 wheel==0.34.2 wordcloud==1.8.1 wrapt==1.15.0 wsproto==1.2.0 xarray-einstats==0.5.1 xarray==2023.1.0 xgboost==1.4.2 xlrd==1.2.0 xlsxwriter==3.1.9 xml-python==0.4.3 yarl==1.9.2 zipp==3.17.0 zopfli==0.2.3
Which one catch your eyes? For me,
  • ffmpeg-python: availability to process multimedia behind the scene
  • python-pptx: prototype slides for your presentation, actually I made a Slides Builder based on it
  • qrcode: tweaking with QR Code
And all the data visualization & machine learning kits … Sky is the limit once you understand those handy tools.

Conclusion

Creating a custom GPT is a trade-off between imagination and limitation, i.e. what you want do v.s. what you can do. Aside from instructions and upload files, Code Interpreter provides creators a sandbox with a Python environment, a few-hour session life, 300+ pre-installed packages and zero network access.
And Actions? Wait for my next deep dive.

Further Reading